Sadly, after almost two and a half years of uninterrupted service, our streak of 100% uptime has come to an end in a rather painful way.
What Happened Exactly
Yesterday morning we were experiencing sub-optimal performance with a piece of hardware and we decided to replace the hardware and nip the problem in the bud. Addressing the problem immediately was the right decision. At the time we were counselled that either:
a) no downtime would be required to remedy the problem
b) a half hour window would be required
Ironically, part of our infrastructure (we are Raid 5) was designed so that this kind of maintenance would be seamless. So with the above counsel from our managed hosting provider RackSpace, we elected to proceed with the maintenance. Then trouble and pain set in…
In a nutshell, yesterday August 29th we were forced to migrate our entire infrastructure. I assure you this is not what was planned, but it became the best solution given the circumstances. Thanks to our managed back-ups, mirrored storage and previous infrastructure upgrade, this process was mostly smooth, but many stages of the migration of the process (copying databases is one example) simply cannot be accelerated.
The Problem That Arose
Besides downtime, one problem did arise for some FreshBooks systems however. As a result of one drive’s mirrored storage hardware failure – the catalyst that started this series of events – approximately 12 hours of account activity which was logged from 12:00 AM to 12:00 PM EDT on August 29th is currently irretrievable for a small percentage of FreshBooks users. Only those systems supported by the affected hardware AND who were active (i.e. sent invoices, created invoices, updated timesheets…) between 12:00 AM to 12:00 PM EDT on August 29th were affected. The vast majority of FreshBooks users would not have had any account activity during this period. Only a small percentage were sustained by the affected hardware.
What Caused the Problem
The cause of the loss and the downtime is not yet clear. Again, our infrastructure was designed to sustain 100% uptime in this exact scenario. The problem affected a mirrored drive that sustained real-time storage for a subset of FreshBooks accounts – the ones with the irretrievable account activity. We will be working with RackSpace (our managed hosting provider) to ascertain the root cause. However, up to this point we have been focusing our efforts on making sure all our users are taken care of first, before we spend our resources finding out the specifics of the cause. So until all the facts are in, I am going to withhold further comment as to the direct cause. When we do get to the bottom of things, we will share the details.
What We Are Doing About It
In the meantime, we want to clearly acknowledge the loss. We are upgrading EVERY FreshBooks account – even those not affected – as follows:
0-3 Clients (free) – 3 extra clients
3-25 Clients (free) – 25 extra clients
26-100 Clients (free) – 50 extra clients
100+ Clients (free) – 100 extra clients
How Do You Know If You Were Affected?
1. Log into your FreshBooks account. In the news section on the home page we will tell you IF your system COULD HAVE BEEN affected. Again, if your account was not active (i.e. sent invoices, created invoices, updated timesheets…) between 12:00 AM to 12:00 PM EDT on August 29th, you would have no missing data.
2. Check your email. We are sending two different emails to all FreshBooks users: one for those accounts affected, and one for those that were not affected. Be sure to check your spam filter to be sure our email did not wind up there.
What to Do if Your Account was Affected
If your account was active during the window outlined above AND your system was affected, then here is a list of considerations:
1. Any invoices, support tickets and clients you created and/or edited, along with uploaded documents, timesheet hours entered, etc. will have to be re-entered.
2. Any emails to clients and/or staff for a newly created invoices and/or support tickets may contain links that will no longer function, so be sure to resend your emails when you recreate your invoices, tickets, etc.
3. Any successful auto-billed transactions or online payments made by your clients during that period may not be “marked as paid” any longer. You should reconcile your invoices with any payments received on Aug 29th – check your payment gateway or your PayPal account for transaction details.
Update: this just in…if you subscribe to your RSS feed of “recent activity” and you had your feed reader on yesterday, you can use your feed to see what recent activity might have been lost, and then use that activity to recreate your missing activities. Thanks to Frank P. for sending this tip along.
We Are Truly Sorry
For anyone who was inconvenienced by the interruption of service and/or irretrievable data, myself and the entire FreshBooks teams are deeply sorry. I want to extend our thanks to those of you who called and emailed to enquire about the problem. To a person, everyone was polite and understanding, which under the circumstances, was greatly appreciated by myself and the other FreshBooks staff who were hard at work bringing the service back online.
A Final Word On Timing
We spoke with many of our users by telephone and email yesterday. I personally spoke with close to one hundred. I want to apologize to you those of you to whom we reported ANY kind of status update with regards to when we EXPECTED the hardware upgrade to be complete. Throughout yesterday we were given misleading information and we were reassured the service would be back up, “in about one hour”. FreshBooks team members passed this information on – myself included – and it proved to be woefully wrong. We are incredibly sorry for sharing information that proved to be misleading and will learn from this experience.