The Fastest Way to Invoice Your Clients

The facts

FreshBooks experienced almost exactly two hours of downtime this evening due to a truck driving into a transformer near Rackspace’s Dallas, Texas–based data centres.

More information

The collision occurred around 7:00 PM ET and for approximately an hour Rackspace’s Dallas data centre ran without experiencing any downtime. However, when the switch was made to auxiliary power, two of the air conditioning units would not restart. Air conditioning is like sunlight for data centres—they need it to survive. In the absence of air conditioning, and with thousands of servers running in a contained space, temperatures rise so high they cause malfunctions and damage machines. After running safely for a little over an hour on auxiliary power the staff at Rackspace decided to proactively take down several rows of servers to ensure that the heat within the data centre did not become too high. In the meantime, contractors worked feverishly to correct the problems with the chillers. Unfortunately, FreshBooks’ Dallas servers were among the rows of servers affected by the downtime.

Why did this downtime happen?

A truck drove into a transformer near our Dallas data centre and two air conditioning units would not restart—two rather odd and unfortunate events. To add a third unfortunate event, while FreshBooks also maintains a Virginia-based instance of the FreshBooks service to mitigate issues like this, we are presently doing some work on our setup there. Taking heed of the direction from the Rackspace team that it would be “about an hour of downtime”, we decided to wait things out.

We’re sorry to anyone who was inconvenienced

We sincerely apologize for any inconvenience this may have caused you this evening. As I drove into the office at 9 PM tonight to meet Aaron and Levi and help field calls and emails, I pulled over several times to answer text messages and reply to emails and keep people appraised of the developments as I knew them. I want to thank the team at Rackspace for doing what I think was 100% the right thing to do—making the difficult decision to take down rows of servers to prevent any real damage from occurring.

On behalf of the entire FreshBooks team—especially Levi, Aaron and myself, and Joe, who was working on things from home—I want to thank you for your patience as we handled the situation. If you experience any issues within the next couple of hours, please bear with us as we work into the night to ensure everything is running perfectly.

20 Comments (add comment)

Nov 13/07
1:05 am
Kirk Friggstad says:

As a recovering sysadmin who’s had his share of late night drives into the data center to clean up after “magic smoke” incidents, I just wanted to say how much I appreciated you folks giving us the details on what happened. So many other companies won’t admit to downtime, or if they do, they don’t give any details on it.

Glad to see the system is back up and running - I must admit that I had a few moments of panic tonight when I couldn’t access my account. :-) Thanks for all the hard work - I really appreciate it.

Nov 13/07
1:06 am

Thanks Kirk, that’s much appreciated.

Nov 13/07
1:07 am

Hi Mike,
That’s indeed a weird chain of events. It is reassuring to know that situations like this one are not taken lightly by your company. Odds are fairly slim that another truck will ever hit that transformer again, but “fate” will be something else in store for sure. Only one more thing: I hope the truck driver is okay!

Nov 13/07
1:09 am

I hope he is alright too Henning, thanks for the note.

Nov 13/07
1:56 am

Sorry buddy, had to delete that last comment there… for obvious reasons… but trust me, it definitely gave Mike and I one heck of a laugh. Thanks for lightening our evening, mate.

Nov 13/07
2:00 am

Thank you for your comments, Mike. We broke our promise to you today. We let you down…and your many many customers. We will work to earn back your faith.

Nov 13/07
6:20 am

If it makes you feel any better, Basecamp and all the 37signals sites were down too. I had a little mini-stroke when I thought my financials disappeared into the dot-com deadpool in the sky, but when I saw Basecamp was down too, I realized it was probably a data-center issue. Glad there was no permanent data loss.

Nov 13/07
7:41 am

Nice one guys!
Thank you so much for the explanation.. now that’s what i call customer service!

UK companies could take lessons from you guys!

Gracias,

Steve.

Nov 13/07
11:02 am

Graham - your team made a made the right call in a tough situation and I hope that they are proud of themselves for that - I know I am. Thank you for leaving your earnest and heartfelt comment. I hope things are quickly returning to normal for you and your staff at Rackspace and thank you for taking the time in the midst of your difficulties to reach out and leave this note - it says a lot about Rackspace and the people who work there.

Nov 13/07
11:54 am

Is Optimus Prime OK?

Nov 13/07
1:31 pm
Ian Rae says:

Thanks for the details Mike, we’ve had to deal with many such incidents over the past 10 years and the lesson you learn over time is that every physical instance of your app will experience unavoidable downtime. The good news is that it is remarkably cost effective nowadays (especially given that you’re working on a LAMP stack) to deploy to multiple pops and route requests based on application availability, proximity or other factors and provide you with much more control, and remove dependence on a single provider. In Montreal. it costs about 2G/mo for a multi site infrastructure (2 racks) with fiber connecting the sites directly for database replication.

We use Basecamp and Freshbooks, so obviously we have a vested interest in you guys staying up!

Nov 13/07
3:06 pm
Holly says:

Thank’s for the info guys. Just wanted you to know I really appreciate the work you do! People now-a-days don’t take enough time to thank the people that help them do their job at the proffesional level that you allow us to do ours!
Well done! Thanks for making me look good!
Cheers!

Nov 13/07
7:56 pm
Tony says:

Bad timing guys. I was pitching for new work yesterday and when the client went looking for an estimate they got a PAGE CANNOT BE FOUND. As a web developer they are hardly going to hire somone who cannot even manage their own internal processes. It may have been night time for you guys but it was peek work hours here in Australia.

Nov 13/07
9:47 pm

Tony - we are all disappointed with what happened and I want to thank you for sounding off and representing an important constituency (FreshBooks users in who were affected in the heart of their business hours). I have sent you an email.

If there is anyone else who was adversely affected by the outage, please contact me directly at mike@youknowwhat or call my cell at 647.204.3808 so that we can go over things and I can look into crediting your account.

Nov 14/07
12:17 am
cb tells tales says:

I should have known that you were the one who deleted my exposé. I suppose I should have known better than to post the name of one of your competitors, but those dastardly fiends should be exposed for the saboteurs that they are.

You’re lucky that all of us here in CB world think you’re A OK, Aaron, otherwise I might have been forced to post a negative comment about you.

Nov 14/07
3:57 am
Jamen says:

All thumbs up for coming clean on the down time and what now and as you can see everyone but Tony really was affected by the incident. Coming from a healthcare company we were required as with most financial and healthcare transactional based sites to have disaster recover. What would have happened if the truck had hit the building and the noc burned down? Would we all be posting thanks for letting us know that we will be down for a few days? I am still small so I am sure I would not be too affected, mind you know I will be sure to do be seeing what kind of exporting I can do on my pending/recurring invoices in the event something worse does happen. I plan to setup redundancy as soon as possible, what does freshbooks plan to do in preparation for disaster recovery in the near future? Again, what would we be doing right now if the truck hit the noc? Thanks and I too am a happy customer but would I be if it had been worse?

Nov 14/07
10:27 am

Jamen - great question. You may have noticed I mentioned our Virginia Data centre a couple times in the post. That data centre runs on a different network and is available to us to over things off in events like these. The reason we did not switch over in this instance is that we were working on some things there at the time (which is really unfortunate) and it would have taken longer than 1-2 hours to switch data centres on Monday night. Were the outage going to be longer, we would have switched over, all of which is to say, not unlike the team at Rackspace, we were forced to make a decision, and given the circumstances, I think we made the right one. So you know, we also use off site backups to a third location to ensure we aren’t down for a few days.

I hope that makes things clearer.

Nov 15/07
3:44 pm

I think you did too, Mike.

@Tony - Remember any relationship can be salvaged with a conversation! Point them to this blog post if they need proof that it was practically an act of god that 404′d your estimate.

Nov 15/07
4:25 pm

CB: luckily, I have full editorial control of the comments. :)

Nov 24/07
10:58 pm
Ali says:

I think you should use hostmysite.com for your data center instead if a data center goes down because a truck ran into their lines that is just not acceptable.


Leave a Comment

*
* (not published)

*
* required