FreshBooks experienced almost exactly two hours of downtime this evening due to a truck driving into a transformer near Rackspace’s Dallas, Texas–based data centres.
The collision occurred around 7:00 PM ET and for approximately an hour Rackspace’s Dallas data centre ran without experiencing any downtime. However, when the switch was made to auxiliary power, two of the air conditioning units would not restart. Air conditioning is like sunlight for data centres—they need it to survive. In the absence of air conditioning, and with thousands of servers running in a contained space, temperatures rise so high they cause malfunctions and damage machines. After running safely for a little over an hour on auxiliary power the staff at Rackspace decided to proactively take down several rows of servers to ensure that the heat within the data centre did not become too high. In the meantime, contractors worked feverishly to correct the problems with the chillers. Unfortunately, FreshBooks’ Dallas servers were among the rows of servers affected by the downtime.
Why did this downtime happen?
A truck drove into a transformer near our Dallas data centre and two air conditioning units would not restart—two rather odd and unfortunate events. To add a third unfortunate event, while FreshBooks also maintains a Virginia-based instance of the FreshBooks service to mitigate issues like this, we are presently doing some work on our setup there. Taking heed of the direction from the Rackspace team that it would be “about an hour of downtime”, we decided to wait things out.
We’re sorry to anyone who was inconvenienced
We sincerely apologize for any inconvenience this may have caused you this evening. As I drove into the office at 9 PM tonight to meet Aaron and Levi and help field calls and emails, I pulled over several times to answer text messages and reply to emails and keep people appraised of the developments as I knew them. I want to thank the team at Rackspace for doing what I think was 100% the right thing to do—making the difficult decision to take down rows of servers to prevent any real damage from occurring.
On behalf of the entire FreshBooks team—especially Levi, Aaron and myself, and Joe, who was working on things from home—I want to thank you for your patience as we handled the situation. If you experience any issues within the next couple of hours, please bear with us as we work into the night to ensure everything is running perfectly.