The Fastest Way to Invoice Your Clients

Sadly, after almost two and a half years of uninterrupted service, our streak of 100% uptime has come to an end in a rather painful way.

What Happened Exactly
Yesterday morning we were experiencing sub-optimal performance with a piece of hardware and we decided to replace the hardware and nip the problem in the bud. Addressing the problem immediately was the right decision. At the time we were counselled that either:

a) no downtime would be required to remedy the problem

b) a half hour window would be required

Ironically, part of our infrastructure (we are Raid 5) was designed so that this kind of maintenance would be seamless. So with the above counsel from our managed hosting provider RackSpace, we elected to proceed with the maintenance. Then trouble and pain set in…

In a nutshell, yesterday August 29th we were forced to migrate our entire infrastructure. I assure you this is not what was planned, but it became the best solution given the circumstances. Thanks to our managed back-ups, mirrored storage and previous infrastructure upgrade, this process was mostly smooth, but many stages of the migration of the process (copying databases is one example) simply cannot be accelerated.

The Problem That Arose
Besides downtime, one problem did arise for some FreshBooks systems however. As a result of one drive’s mirrored storage hardware failure – the catalyst that started this series of events - approximately 12 hours of account activity which was logged from 12:00 AM to 12:00 PM EDT on August 29th is currently irretrievable for a small percentage of FreshBooks users. Only those systems supported by the affected hardware AND who were active (i.e. sent invoices, created invoices, updated timesheets…) between 12:00 AM to 12:00 PM EDT on August 29th were affected. The vast majority of FreshBooks users would not have had any account activity during this period. Only a small percentage were sustained by the affected hardware.

What Caused the Problem

The cause of the loss and the downtime is not yet clear. Again, our infrastructure was designed to sustain 100% uptime in this exact scenario. The problem affected a mirrored drive that sustained real-time storage for a subset of FreshBooks accounts – the ones with the irretrievable account activity. We will be working with RackSpace (our managed hosting provider) to ascertain the root cause. However, up to this point we have been focusing our efforts on making sure all our users are taken care of first, before we spend our resources finding out the specifics of the cause. So until all the facts are in, I am going to withhold further comment as to the direct cause. When we do get to the bottom of things, we will share the details.

What We Are Doing About It
In the meantime, we want to clearly acknowledge the loss. We are upgrading EVERY FreshBooks account – even those not effected – as follows:

0-3 Clients (free) – 3 extra clients
3-25 Clients (free) – 25 extra clients
26-100 Clients (free) – 50 extra clients
100+ Clients (free) – 100 extra clients

How Do You Know If You Were Affected?
1. Log into your FreshBooks account. In the news section on the home page we will tell you IF your system COULD HAVE BEEN affected. Again, if your account was not active (i.e. sent invoices, created invoices, updated timesheets…) between 12:00 AM to 12:00 PM EDT on August 29th, you would have no missing data.

2. Check your email. We are sending two different emails to all FreshBooks users: one for those accounts affected, and one for those that were not affected. Be sure to check your spam filter to be sure our email did not wind up there.

What to Do if Your Account was Affected
If your account was active during the window outlined above AND your system was affected, then here is a list of considerations:

1. Any invoices, support tickets and clients you created and/or edited, along with uploaded documents, timesheet hours entered, etc. will have to be re-entered.

2. Any emails to clients and/or staff for a newly created invoices and/or support tickets may contain links that will no longer function, so be sure to resend your emails when you recreate your invoices, tickets, etc.

3. Any successful auto-billed transactions or online payments made by your clients during that period may not be “marked as paid” any longer. You should reconcile your invoices with any payments received on Aug 29th – check your payment gateway or your PayPal account for transaction details.

Update: this just in…if you subscribe to your RSS feed of “recent activity” and you had your feed reader on yesterday, you can use your feed to see what recent activity might have been lost, and then use that activity to recreate your missing activities. Thanks to Frank P. for sending this tip along.

We Are Truly Sorry
For anyone who was inconvenienced by the interruption of service and/or irretrievable data, myself and the entire FreshBooks teams are deeply sorry. I want to extend our thanks to those of you who called and emailed to enquire about the problem. To a person, everyone was polite and understanding, which under the circumstances, was greatly appreciated by myself and the other FreshBooks staff who were hard at work bringing the service back online.

A Final Word On Timing
We spoke with many of our users by telephone and email yesterday. I personally spoke with close to one hundred. I want to apologize to you those of you to whom we reported ANY kind of status update with regards to when we EXPECTED the hardware upgrade to be complete. Throughout yesterday we were given misleading information and we were reassured the service would be back up, “in about one hour”. FreshBooks team members passed this information on – myself included – and it proved to be woefully wrong. We are incredibly sorry for sharing information that proved to be misleading and will learn from this experience.

48 Comments (add comment)

Aug 30/06
10:48 am
Lisa Retief says:

What timezone are you referring to 12:00 AM to 12:00 PM?

I am an affected user.

Aug 30/06
10:53 am
Jay Boyd says:

As a web application developer, I raise my drink to you FreshBooks! Your bytes and bits will be in our prayers, oh keeper of the data. :-)

Keep up the good work.

Aug 30/06
11:05 am

Thanks Jay.

Lisa - thank you for pointing that out. Thanks to a combination of sleep deprivation and multiple revisions to this post, the timezone was edited out.

The timezone referenced is Eastern Daylight Time. I have updated the post to reflect that now. Thanks again.

Aug 30/06
11:06 am

Thanks for the open communication and commitment to quick resolution during this ordeal. Despite meticulous care to prevent these kinds of things, every company eventually has to deal with them. I thought you guys really shined during this “mini-crisis,” and I appreciate that.

Aug 30/06
11:09 am

I’m guessing you guys had a fun night eh Mike? ;-) Seems like you’ve handled it as well as can be, hope you’ll get some well deserved sleep.

Aug 30/06
11:14 am
Brad M says:

I am currently developing a web application service and that sort of problem scares the heck out of me.

The Freshbook team should surely be given a great big pat on the back for their efforts! I have been in similiar (although not as intense) of a situation it is NOT FUN. You have shown great professionalism and courtesy with the updates in the header (Great idea!) and your free upgrades to the accounts. Thumbs up!

I’m curious how you knew which accounts were affected and which weren’t. That may be a great future technical article!

Aug 30/06
11:17 am

All of us in the technology field have had to deal with days/nights like you guys just did. I for one greatly appreciate your detailed information, acknowledgement of the problem, and your willingness to provide your clients with some perks to make up for the inconvenience. Outstanding customer service is *very* hard to come by nowadays. I am a new trial FB user who is now sold, if I wasn’t already!

Aug 30/06
11:20 am
Windy Brown says:

It happens to the best of us (or that’s my excuse, and I’m sticking to it!). The experience must be akin to being flayed alive or eating glass, but it does strengthen you… ;-)

Thanks for the concerted efforts and notification.

Aug 30/06
11:41 am
John L says:

I appreciate the honesty, dedication and commitment on the part of the FreshBook staff. Really. (separates the Mice from the Mensch!)

But I am concerned about the data loss. Ours was small, thankfully. But isn’t drive mirroring supposed to prevent exactly this kind of loss?

I hope some challenging questions will be asked of RackSpace.

If a mirror was not in place I suggest having one.

Aug 30/06
11:42 am
Chris says:

I would just like to mention that I completely understand the situation you went through yesterday. I have been under similar pressure and it is not easy.

The updates on the page header and notes about pizza and blankets was comforting. It is nice to know that real people actually care and keep this service running smoothly.

Good luck in the future (and thanks for fixing Freshbooks before months’ end!)

Aug 30/06
11:50 am

Thanks everyone.

John - the drive mirroring was indeed in place. That is all I will say at this time. That said, I added the following paragraph to the post:

“…up to this point we have been focusing our efforts on making sure all our users are taken care of first, before we spend our resources finding out the specifics of the cause. So until all the facts are in, I am going to withhold further comment as to the direct cause. When we do get to the bottom of things we will share the details.”

Aug 30/06
11:55 am
Josef says:

What a great example in how to best handle a “disaster.” Even during the desperate hours of the outage, you and your team remained professional and positive. When I telephoned you to learn what was going on, it felt like any other telephone call to your office — an experience that makes me stop in my tracks and say aloud, “Wow! I sure am glad I found you guys!”

I do have a question regarding the outage: Upon logging in this morning, I see a box that tells me “Your account WAS affected… click here…” but I don’t see any emails from you in my mailbox or spam filter. Clicking “here” just brought me to the blog posting I already read.

I’m not sure if any of my clients tried to log in or successfully posted anything during the outage period. Please advise.

BTW — Your offer of a few extra client accounts - very professional. You should be a case study in proper customer service in a crisis! Awesome team effort!

I continue to wish you all the BEST!

Aug 30/06
11:56 am
Richard says:

I too appreciate the information you provided during the downtime issue. My staff and I have come to depend on your application so I hope this won’t happen again. Good job under duress - you guys are pros.

Aug 30/06
11:58 am

Bottom line is business has adversity. How you deal with it is what matters. Thanks to Freshbooks for doing their best to communicate and keeps its customers informed. We appreciate your efforts, and we know you’ll see to it we are protected in the future.

Aug 30/06
11:59 am

Josef - thanks. We are just finalizing the emails as I type this comment; I hope they will be delivered within the hour. Also, I am going to reorder the “How Do You Know If You Were Affected?” section and put, “Login to Your Account” as step one of two.

Aug 30/06
12:00 pm
John L says:

Thanks Mike. And from my limited knowledge of these things (as explained to me by *our* server host) a mirrored server (vs. a mirrored drive) could replace the down server in a matter of seconds or minutes and be fully up to date.

But you are right, this topic is premature.

Glad everythings is back up.

Thanks again for your efforts - get some sleep.

Aug 30/06
12:05 pm

The way you handled the outage is admirable as is the clear, honest and open communication.

I will say though I’m not impressed with the technical lack of preparedness for such an eventuality, and the real inconvenience this has caused for our companies and our customers.

Not only should you be running Raid-5 mirror (which in my experience isn’t that great, since if the controller fails, all drives fail)… I recommend you should also be doing the following:

1) A second dedicated database slave/replication server should be running, which is mirrored in real time from the master database server. This is not a ‘hard drive’ mirror, but rather a database replication server. Not sure what database you’re using, but if it is MySQL, it has this capability out of the box and RackSpace should be able to configure a slave realtime backup server for you.

2) Your slave db server should have database ’snapshots’ taken at least hourly and then all those snapshots backed up securely offsite at least daily. This enables you to never lose more than an hour of data worst case. With the slave technically you should never lose ANY data, but the backups of the slave are there ‘just in case’.

Hardware RAID and drive mirroring is just not reliable enough to depend on, as it seems you have experienced.

Please take these comments in a positive light as a way to improve. I have learned the hard way and went through similar growing pains with web apps as you guys recently have. I just had hoped that you would have had such things already in place.

Again thanks for the fantastic openness and communciation in handling the situation, and I strongly encourage you to upgrade the disaster prevention procedures you have in place.

Best regards,
Justin MacLeod

Aug 30/06
12:46 pm

Justin - For the record, we have already implemented much of what you outlined (i.e. we used our snapshots to restore data).

Again, I’m going to hold off on further comment until we get to the bottom of things. In the meantime, thank you for posting - your coments are useful, not just for us, but for anyone who is running a web application.

Aug 30/06
12:52 pm
Adrienne Adams says:

As a new user to FreshBooks, I am very impressed with the excellent level of customer service you folks have provided during yesterday’s crisis.

Good service is an uncommon skill in business these days, but companies like Fresh Books prove that it is not a lost art.

Many thanks.

Adrienne Adams

P.S. Greetings from the San Juan Islands, Canada’s smallest province! ;D

Aug 30/06
12:55 pm

You guys have been very clear on what went wrong, and that is a good thing. I myself am in the web-development business, and have had my share of sleepless nights compating server issues and loss of data.

You have handled everything admirably.

One thing I would like to suggest is moving this blog on a different server than the system itself. The first thing I did was check this blog when I could not log into my timesheets. That would enable you to communicate better with your clients should the system go down again for some reason.

Thank you for a great service, and keep up the good work!

Aug 30/06
12:55 pm

Dear Freshbook Team,

Thanks for being so honest and open to us and we appreciate your honestly so much.

Thank you and please keep up the good work!

Our support are with you ALL

Marcus
CEO of European Relay Service.

Aug 30/06
1:08 pm
Chris Gray says:

Mike and everyone else on the FreshBooks team,

I’ve been in the tech field for several years and have developed and supported large web applications for mid and large sized enterprises. Including critical applications such as the backend for a creditcard processing system.

Having this said, I have a few things to note:

[1] I have to say that it is extremely refreshing to see such an open and immediate communication with the clients about issues such as this. I think this is really comforting to the clients and more companies should follow this policy.

[2] I have done several time critical disaster recovery solutions and it is definately very stressful and more difficult to ‘work under the gun’ and you guys have done an A+ job during an unexpected disaster such as this.

[3] I have also been through a similar technical situation where a RAID5 dell server lost all of its data. The main issue was not with the hard drives (although one did fail) but with the raid controller itself. It was an older version and there was an issue in a very small percentage that the drives would lose all container information. Unfortunately, if you tried to recreate the container manually on that version it would cause total data loss on the hard drives as well.
I had built the infrastructure specifically to avoid situations such as this, however, in certain cases even the magical RAID 5 scenario can fail due to other hardware issues, and thankfully your secondary failsafe (your backups) was implemented perfectly.

I know the chances that this specific issue with arise again is very very small since raid controllers don’t malfunction like this very frequently. Also, if they do you can normally swap the raid controller and recreate the identical container information on another card and do NOT have it initialize the drives, and your old container will be functional.

However, if you still wish to be prepared for a scenario such as this again, depending on your DBMS you can create a secondary server with a hotbackup. MSSQL has replication features under the enterprise version and MySQL and Postgres support similar items. Obviously, Oracle has support for this as well.

All in all, I applaud your efforts and I strongly believe that you have reacted and done everything perfectly given the situation.

Thanks and good luck with the rest of the recovery

P.S. If anyone is not familiar with RackSpace, they are an industry leader and offer great services. If at the end of the day, RackSpace is at fault for a technical mixup, I am sure it would be an isolated case.

Aug 30/06
1:34 pm
Ivan says:

I am glad to see such open, honest communication at times like this.

I commend you for your work ethics and your efforts.

Thanks for making our experience with your company a pleasant one even during this type of events.

Aug 30/06
1:45 pm
Navneet Kaushal says:

Can you please let us know if the auto notification and reminder services for unpaid invoices also got affected?

Aug 30/06
2:35 pm

Navneet - yes the auto notification for late unpaid invoices went out as normal.

Aug 30/06
2:40 pm
Trevor says:

I love Rackspace, but I recently had an issue like this with a large failure and big downtime. I’m looking forward to hearing your follow-up on this issue, once you’ve gotten to the bottom of things. I’d also really appreciate a description of how you’re going to better protect yourself in the future, as I’m finding it difficult to figure out exactly how to best configure my hosting setup. It seems like there’s a lot of conflicting information out there, and I think you’ll be in a great position to help those of us out there who are also running web-based services.

Aug 30/06
3:34 pm

Mike,

Your method of doing business continues to inspire and educate me. I am going to review our systems in these areas and make sure we are as ready as we can be for such an event.

Get some rest.

Cheers,
Rol

Aug 30/06
3:44 pm

Good job. I recently joined with a paying account and I’m impressed that you came clean and with the bonus.

Definitely inspires confidence for the future.

I do similar things for my customers and am glad to see that the “Goods satisfactory or money refunded” of T.E. continues to live on in T.O.

Aug 30/06
5:42 pm
Myke says:

Freshbooks Team,
I can totally relate to issues like this. Sometimes the unexpected happens and it’s clear that you guys are going to do your best to not let that happen again. It’s really respectable that you guys are so public about the information and I think it shows that you really care about your customers and your business.

I’m not a user but I’m signing up.

Aug 30/06
10:01 pm
Tom says:

Situations like this are one of the worst things which can hapen to any modern web based service provider. Thumbs up for the way you guys handled the situation. Its something we can all learn from and maybe be prepared for situations like this in the future with even more paranoid backup methods. Hopefully these things will not occur any time soon, so you guys can concentrate on the stuff which you are best at, making and improving your userfriendly, state of the art invoicing/support/CRM/time tracking solutions :)

Aug 31/06
4:50 am

Rol Miller said it perfectly. Acknowledging the problem, apologizing and going BEYOND making good on it–these are execptional steps and build customer loyalty and goodwill far beyond this incident. You guys ROCK.

Aug 31/06
12:12 pm
Donna says:

Thanks FB team, I am incredibly in love with FB as an application and as a suppor team. You make it feel like an active community rather than some software where you’re relegated to seeking help on some help manual when things don’t go right and sent to an endless loop of phone tag when you are in need of an actual personal communication. Very cool.

Aug 31/06
2:33 pm

i would like a copy of my first month bill so i can use my promotion.

Aug 31/06
2:43 pm
Aug 31/06
3:13 pm

Vicki - I am not 100% sure I am reading you right, but if you are referring to the client upgrades outlined in this post, your account should have been upgraded already. Please send us a note if it has not been:

http://www.freshbooks.com/contact.php

Aug 31/06
4:08 pm
Donna says:

My account has been affected by the upgrade. The only thing I noticed was that my preferences for recurring invoices were changed. I had them set to make drafts instead of being sent and then an invoice was sent, without further edits. That’s a shame. I have already notified the client that I will be changing that invoice but that was just an inconvenience. :(

Aug 31/06
4:26 pm

Hi Donna,

I’m sorry to hear that. I will be taking a look at this issue right away. In the mean time, I recommend you turn off that feature in your invoice preferences.

Aug 31/06
5:02 pm
Laura Nothnagle says:

I’m impressed with how you handled this crisis (and we all have them). You did very well informing us thoroughly and in a timely manner. I feel even more confident about your service because of your forthright approach, and because I’m sure you learned valuable lessons that will benefit FreshBook’s future business decisions.

Sep 1/06
9:40 am
Donna says:

Thanks Daniel. I’ve already looked up the rest of preferences and account details and it looks good.

Sep 1/06
10:36 am
Steve says:

Luckily it looks like I only added 1 customer that day and should be able to re-create that account with little effort.

This causes me to think it would be a good idea if I could back up my own account, though I doubt I’d do it daily, I’d probably do it after adding new clients and right before and after the first of each month when all my recurring billing happens.

Will the import/export functionality back up all my data, or just the clients? I see the Reports allows sending data to csv files, but which report would create a comprehensive data back that would allow a full restore?

Thanks
Steve

Sep 1/06
12:05 pm

Steve - these are great questions.

You can export your clients at any time (and import them as well for that matter). As for the rest of your data, exporting by CSV from the reports section is indeed a really good way to back things up. In particular the “Invoice Details” report would effectively give you all the details of all your invoices. That report coupled with your “Payment History” report would give you a complete snapshot on your receivables side.

As for a “report [that] would create a comprehensive data [backup] that would allow a full restore”, this is an interesting comment. If I may, I’d like to withhold further comment on it for the time being as we finalize our next steps. Thank you for posting.

Sep 1/06
2:45 pm

I think the least you can do is send us to a complementary trip to Disney World!!

Thank you for your honestly.. How many web Companies say “It must of been your internet connect or they were not aware of any issues..”

Sep 2/06
6:34 am
Jens says:

I just want to ad my five cents. As a user that could have been affected, but luckily was not, I want to commend your team for openness and a very high level of information all the way. All in all, it lived up to the high standards we have all come to expect from Freshboks!
All the best from Copenhagen,
Jens

Sep 2/06
11:36 pm
Reginald D says:

Now this is what I like to see from a company! In business you have to be able to get back on your feet QUICKLY after being knocked down. The reparations you have given are suitable but one has to wonder why rackspace has been silent throughout this whole ordeal… or have I missed a blog from one of their technical representatives??

When a company such as Rackspace delivers a low blow to a company they do business with, they fully realize the negative impact goes down the chain. From my point of view, there are 3 links to the chain supporting my company. Rackspace - Freshbooks - Me. There exists a weak link in the chain. Now then, what will Freshbooks do to remedy this weak link?

I have to admit that in business I am very tough. If this situation was not handled so gallantly by Freshbooks I would have taken my business elsewhere. Yet, I can not help but feel that Rackspace should step forward and provide reparations as well. Their behaviour is not acceptable, especially not for a well known company.

For the past few days I have been livid but thankfully, most of my employees had been given substantial time off so drastic measures were not immediately taken on our part and we are still with Freshbooks.

I await an official response from Rackspace.

Sep 6/06
7:33 pm

Just want to agree with previous posters on the professional and open way you dealt with a difficult situation. Im an IT consultant so I know that ’stuff’ happens and when complex IT systems fail its often very hard to work out why, or to give a good estimate of when they will be fixed.

What sets a company apart is not their imunity to disaster but the way they handle themselves when the fat hits the shins.

Keep up the good work

Oct 22/06
2:07 pm

[...] While at Office 2.0, I met with FreshBooks CEO, Mike McDerment. Readers might recall I really like Mike’s approach to SMB service organisations in his FreshBooks service. It goes without saying that Mike is one of the good guys, genuinely concerned to meet customer expectations and yet be open when things go pear-shaped. [...]

Jan 8/07
3:17 pm

[...] Freshbooks responds to downtime It’s easy to provide great service when things run smoothly. Handling problem situations is a much tougher — and often more important — test. Freshbooks’ Up and Running blog post is an example of how to do it right. [...]

Feb 13/07
3:00 am

[...] the face of a potential disaster, it is possible to satisfy customers. Look how Freshbooks handled what could have been a serious crisis. To put this in perspective: When I met CEO Mike McDerment last October, he said the registered [...]


Leave a Comment

*
* (not published)

*
* required