Back to Blog Home

Unexpected downtime this morning

by Tim Lee  |  February 1/2010  | 

At 8:30 AM EST this morning we had an issue in our Dallas Data centre and switched to our secondary Data centre in Virginia. We are now back operating in Dallas, however the activities you performed in your account during the time at Virginia (9:15-9:47 AM EST) currently still reside EXCLUSIVELY in the Virginia data centre/servers.

To get things back to normal we’d suggest that you simply re-perform the FreshBooks activities/tasks that you took between 9:15-9:47 AM – be it sending invoices, logging time, etc.

If you need any assistance all those records and activities are still available on the Virginia server and we can help you replicate those efforts – please contact us.

So you know, we have not isolated the root cause of the outage as of yet. Our development teams are tracking down suspected causes and our operations team is monitoring the server closely. First thing first was to restore service (which we have done) and we will update this post with more details as we have them.

We will be updating this post throughout the day with additional information about this outage. You can also follow our status tweets for immediate notification.

We would like to apologize to anyone that was affected by this outage, and we would like to make it up to you. Please get in touch with us at 1-866-303-6061 or email us at support@freshbooks.com.


  • JJ

    It would probably be helpful for you to email the affected customers as not everyone checks this blog.

    I have emailed support, but lost all of my changes from this morning (including spending time re-working a bunch of time entries). Can you please clarify the following:

    1) Will we get the data back from this morning or should we re-create all of our invoices / time tracking changes?

    2) Were our clients actually emailed if we sent invoices during this period?

    As you can imagine, many of your users are trying to close out the books for January. It should be noted that not only are your direct users are affected by this issue, but at least in my case my clients are as well.

  • http://www.freshbooks.com/our-team.php#tim Tim Lee

    Hey JJ

    We are working on the email as we speak (or in my case, type!). To answer your questions:

    1) We’re recommending you to recreate that information. If you need assistance with this, we can help because we have all the data on our secondary server. Give us a call to sort this out! (1-866-303-6061)

    2) Yes, unfortunately, your client would’ve been emailed. You will need to recreate the invoice and re-send.

    We understand this is bad timing as it is the first of the month. But we’re here to help you out.

  • http://www.applynx.com Scott

    Hello,

    My freshbooks site still appears to be down. Is this unique to me?

    Thank you

  • http://www.applynx.com Scott

    Never mind — it’s back up

  • http://www.applynx.com Scott

    Wait — it’s back down again.

  • http://www.applynx.com Scott

    My Freshbooks site was up and down intermittently in Firefox, but seems to be more stable in IE (8). Maybe a caching problem. Just FYI for anybody else having trouble.

    Thanks,
    Scott

  • http://www.freshbooks.com/our-team.php#mike Mike McDerment

    Cheers Scott

  • Charles

    Wondering what the point of a backup site is if the data is not synced back to the primary automatically. I would rather a total outage than to be entering data in a backup site that is no good once the primary site is live again.

  • http://www.cstrom.com Chris

    Are there any suspected security breaches?

  • http://www.freshbooks.com/our-team.php#tim Tim Lee

    Hey Charles,

    I agree, the whole point of a back up system is to have it seamlessly sync back to the primary. That is the way it is set-up. Unfortunately, it didn’t work the way it was supposed to this morning. Sorry about that! We do have the data, let us know if you need help getting it back.

    Hey Chris, there are no suspected security breaches.

  • ilya

    Any thoughts on why data were not propagated back to Dallas from Virginia? Seems counter intuitive and, kind of, defies the purpose of a secondary data center?
    (I am not complaining at all, just curious)?

  • http://human3rror.com John (Human3rror)

    thanks for letting us know. lost a few things, but, that’s ok.

  • R. Lee Riser

    Are there defined SLA or uptime guarantees for paying Freshbooks customers? Also curious about how often a DRP is tested…

  • Jud

    IMHO, only paying customers should be complaining about DRP’s are tested :P

  • Jonathan

    I agree with Charles. One of my reasons to put my billing and books online was better protection than my local system. Redundant means redundant. Did you check the redundancy before today’s event?

    Simply saying you are sorry is trivial at this point.

    This is troubling to say the least.

  • Rick

    Just my two cents, but as a web developer i can honestly say redundant or not; it is not 100% full proof. If the backup goes down while the main server is pushing content, you can end up with two corrupt copies. Odds are 1 in a billion it will, but it can.

  • http://www.freshbooks.com/our-team.php#mike Mike McDerment

    @charles @jonathon

    I understand your concerns around replication.

    Please know we have the data – it was recorded, and we do in fact replicate in real time in multiple data centers. That said, today when we failed over from our primary data centre to our secondary, there was a replication issue that persisted for 32 minutes in our secondary data centre.

  • http://www.interlockit.com Blair

    I must agree with Charles that a total outage would have been better. I specialize in Cloud Computing consulting. This makes me concerned about the reliability of the disaster recovery plan (DRP) at Freshbooks and the impact it may have on clients I was planning to migrate. I’m a paying customer and fortunately was not affected.

    I applaud your integrity in being up front and public about the impact instead of trying to keep it under wraps. It’s a good product.

  • http://www.rettkommunikasjon.no Jan Wiggo

    Hi guys. Hope todays difficulties have caused to much trouble for you. Seems like you have things under control…

    Fortunatly I did’nt do any critical work during this period, but I’m sad to say (I’m a very satisfied customer :-) ) that I for the first time have, and still experience some lagging in the system, is this caused by on-going technical problems on your side?

  • http://www.rettkommunikasjon.no Jan Wiggo

    Sorry I meant “Have NOT caused”… *lol*

  • Cathy

    @Jonathan: Saying you are sorry is never trivial.

    Freshbooks has always exhibited professionalism and genuine concern for all their clients large and small. Frankly, I would rather a company say they were sorry and mean it than to have them make up excuses and point fingers at hosts/services. Try finding that at Quickbooks Online or any other service!

  • http://www.freshbooks.com/team/rich Rich Lafferty

    Ilya: Well, it wasn’t supposed to happen that way.

    All of our database servers are meant to be ready to become the primary server at any moment; even when they’re not doing anything, they’re configured to log transactions for replication. That means that when we do fail over, the database server is already set to go.

    Except in this case, the database server in our DR datacenter wasn’t configured like that — we had to manually tell it to start logging transactions. And as soon as we noticed its configuration was out of date we fixed it, but that doesn’t help for the transactions that happened between failing over and noticing the problem.

    The good news is that that makes it really easy to make sure this never happens again; we’ve already fixed the direct problem for next time, and we have the tools in place to manage configurations centrally, we just haven’t yet deployed them for all services.

    So in short: it wasn’t a failure in the way things are supposed to work — we’d planned for the situation where we have to fail over and back, and by design the data would follow. But we made a mistake with implementing the design, one that we’ve corrected already for this particular case and that we’re prepared to correct in the general case.

    (And believe me, it kills me to have the data sitting there in Virginia with no practical way to move it back. We’re handling that case-by-case for customers who need it, though — drop support [at] freshbooks.com a line if you’re one of those.)

    And, of course, my apologies for the missing data and the inconvenience following from it. We want you to be able to count on us and I’m going to make sure that you can going forward.

    Warm regards,

    Rich
    Manager, Network Operations

  • http://www.dreamten.com Philip

    Thanks guys for being forthcoming and honest about the issue. Many companies would have attempted to sweep downtime like this under the rug and pretend like it didn’t happen.

    For all you guys complaining, these things happen. While there was indeed a problem on their end, they shouldn’t be blasted for being open and honest about it. In the 3 years I’ve been using Freshbooks, this is the first time I’ve ever seen it down, so cut them some slack.

  • Tobias

    Yes unforeseen issues occur. The goal of a decent failure plan is to simulate failures and examine the results. I’m happy to give you a few hours of downtime to fail the system.

    This same thing happens to shops that never try and restore backups on a regular basis. When a failure occurs everyone is scratching their heads.

    In all fairness I have been there done that (SAS 70). None the less, I can create redundancy in my office, fail it and test the results. I expect the same from a company that is charging me to store and protect part of my billing system.

  • Paul

    Seconded what Phillip said. Honesty is the best policy. Thanks for keeping us in the loop.

  • Salvatore

    Does not sound like you guys made a mistake. You just had a learning experience.
    I really appreciate being able to use your service.
    It sounds like you had an excellent plan for Replication across several data centres.
    Sometimes those Learning Experiences happen because people are not fully alert, and have lost some sleep, the previous night, the previous weekend. In this case when you are working on the app, you need to more honest with yourselves, about how “alert” you actually are.

    In those cases, you could use a double check, and double check the person who double checks is alert enough too.
    There are apps on the web to test how alert you are, and they do it indirectly because they are meant to improve your brain function, but when you can’t get the same score you did yesterday, you need to ask why, and perhaps ask why your brain is not functioning as well as it did last Friday (if it’s Monday), or since yesterday, because you stayed up late with your girlfriend to watch that fun romantic movie.

    I appreciate the transparency!

  • Jonathan

    Wow, either we have paid advertisers for Freshbooks or we have those that are less discerning than I. As a paying customer it’s my responsibility to hold accountable those who provide services for me. Sorry does not fix the problem. Transparent or not holding a company accountable to their error is not only good business but also the right thing to do. Just because a company is “nicer” and more open about their problems does not excuse the consequences of their actions or inactions. To those supporters I challenge you to maintain the same level of commitment after the 2nd or 3rd time this happens. I do agree that talking about the problem in an open manner is also good. I like Freshbooks and the product they sell.

  • http://www.geoffreywiseman.ca/ Geoffrey Wiseman

    The fact that some of you may have recorded information during the downtime that hasn’t been automatically recovered is definitely a failure. If there’s a means to bring that information back (which it sounds like there is), I’m not sure why that won’t be employed for all customers, I wouldn’t mind that being stated more clearly.

    In my case, I wasn’t on the system during the downtime, so I’ve lost nothing. I recognize that there’s a failure here, but it seems as if the failure’s cause is understood and steps have been taken and will be taken to prevent reoccurrence, so that covers the bases for me.

    If your data is affected, I recognize that might be a different story for you. Also, if this were a common occurrence and/or had been handled badly as it was happening, I’d be more worried.

  • http://www.freshbooks.com/our-team.php#rich Rich Lafferty

    Geoffrey: We don’t have a systematic, automated way of bringing data back that doesn’t mean making the entire application unavailable for too long, since we couldn’t automatedly restore things and have people using FreshBooks at the same time.

    (The automated way is replication, which is the thing that wasn’t working in the first place.)

    The number of affected customers is pretty small, though, so we’re doing it by hand on request: looking to see what changed during the window, and then figuring out what’s necessary to accomplish that given the current state of their account.

  • Jake

    While I do agree that outages are always disappointing with any SaaS app, I feel that you guys [Freshbooks] are doing a great job in the cleanup process, and are offering a suitable remedy to the situation. We’ve been using Freshbooks for about 3 years, and I can’t say that I can recall any other unexpected outages occurring with the application… ever! We love working with you guys not only because of this but also because of your fantastic support and low prices. Keep up the good work!

  • Jason

    This sucks! I’ll be switching away from Freshbooks. So long horrible service!

  • http://www.writingitrightforyou.com Pamela Hilliard Owens

    I didn’t do any Freshbooks work until this afternoon, so I didn’t even know there was an outage until I received the email.
    To those of you (yes, Jason, I’m talking to you) who are ready to bail out on Freshbooks because of a 30 minute outage…I hope you find what you’re looking for behind the curtain in the Emerald City.
    Maybe it is because my company is very, very small and I don’t invoice more than 1-3 clients per day, if that many.
    But Freshbooks has been the best thing to happen to me and other very small companies in a very long time.
    They have a GREAT product/service and are always improving it with few, if any glitches.
    Have you tried getting excellent customer service from your cable company or cell provider lately?
    Freshbooks support is always there for you whether by email or phone–always cheerful; always helpful no matter how small the question.
    Thanks for the honesty, integrity, and transparency and the app, period. Freshbooks!

  • Matthew

    So with a catastrophic failure, there was 32 minutes of data orphaned. The service appears to have not been down for an extensive period of time, the disaster recovery plan is now far better than it was, and even with the downtime, what is the percentage uptime? I’m guessing better than your local telco that has life-or-death 911 services?

    Perhaps a percentage uptime figure might help shed a little more light on what was more of a hiccup in the long term of things. Perhaps the “haters” in the crowd could compare that with their own uptime statistics before making hasty comments.

    I, for one, am more comfortable in the fact that this wasn’t a cover up, and that this in effect has tested the DRP and allowed corrections that could have been much more serious if not found.

  • Pingback: uberVU - social comments

  • http://thefrontiergroup.com.au Adam Fitzgerald

    This is a minor blip in what I’m sure is a great lesson all round for a solid operation. This stuff happens to nearly all web companies/applications from time to time (both big and small).

    Unfortunately some people will never understand that and those are the ones that will move on. Strange that they don’t abandon Microsoft every time that product crashes their computer and loses the changes in a document.

  • A.J. Choy

    1. Thanks for the transparency.

    2. Suggest you offer the affected customers one free month of usage. This will show that you are not only sorry, but that you are also fair and reasonable.

  • Cacao Monkey

    Ive mostly been pleased with what I saw and read about this glitch with serious consequences for a few.
    because I have seen the inside of server admin
    then I was unsure about exactly what you mean by having to recover the data by hand.

    surely wheather you are using Oracle or PostGres or even microsoft sql, you can build a script to identify the affected records then export those records to a compressed data file optionally automating an scp or other secureftp to send the file to the primary data centre.
    then from the same terminal, log into the primary database server and reverse the script to import all the selected data.

    thats what I call a manual data transfer or recovery.

    are you refering to using the hand to move the fingers to retype all of the data?

    unfortunately I know a lot about the challenges of remote collaborative databases and how putting it online brings extra challenges to crafting something that is not just functional but viably worth while for a small business to actually use it.

    So far I liked everything I see and used toping off the list with comupter morons and accounting morons can actually use it easily.
    before I make any further recommendations to anyone about your service offering I need to know two things;
    - What Database are you actually using on the back end.
    - That you have On Site Staff database administrator who are capable of doing what I said above.

    to me this naturally extrapolats to you do not relly on any external technical support contracts except for fast hardware replacement. That includes Oracle and Microsoft and who ever manages the data centres you essentially have to lease instead of operate by hand.

    Id really appreciate if you could clear this up for me.

    Thanks

    Feb 1/10
    5:20 pm
    Rich Lafferty says:

    Geoffrey: We don’t have a systematic, automated way of bringing data back that doesn’t mean making the entire application unavailable for too long, since we couldn’t automatedly restore things and have people using FreshBooks at the same time.

    (The automated way is replication, which is the thing that wasn’t working in the first place.)

    The number of affected customers is pretty small, though, so we’re doing it by hand on request: looking to see what changed during the window, and then figuring out what’s necessary to accomplish that given the current state of their account.

  • http://www.freshbooks.com/our-team.php#rich Rich Lafferty

    We’re using MySQL, and yes, we have people who administer it; not dedicated DBAs, but developers and system administrators.

    And no, we’re not retyping the data, but we’re only copying over data for users that request it: finding out what the user needs, and exporting and importing just that data for that account.

  • http://www.melotel.com John Meloche

    @Jonathan and @Jason

    You guys seriously need to get a grip on reality. If you’re instinct is to move away from FB for 30 minutes of downtown, I can’t imagine how you’re even still in business. Seriously… get a grip! I have had clients in the past for my own service I provide who wine and complain always looking for freebies and looking for any opportunity to complain. I can understand if it’s a regular occurrence or there was no support, but seriously… for this???? GET A GRIP!

    This service is stable, reliable, affordable and obviously the owners are accountable.

    I guess it’s a fair assumption that your own businesses are perfect????

  • http://faxnrelax.com Bill in Detroit

    Dear Whiners,
    What happened to YOUR copies of the data? And just how much data did you transmit in a 32 minute time frame, anyway?

    I watched the geo-location app for several minutes and it just didn’t look all that busy. If “it’s 5 o’clock somewhere”, it’s 8:30 am somewhere else … so it doesn’t look like anyone could have lost a serious amount of data.

    My guess is that you lost next to nothing. At the very most, it would have taken you 32 minutes to re-transmit it.

    By the time you read these comments, you could have been done.

  • Pingback: Lessons from ClearBooks failure AccMan

Categories

About FreshBooks

FreshBooks is an online invoicing, time
tracking
and expense management
service that helps people save time, get
paid faster, look professional and focus on
what they love to do - their work.


Rodney's 404 Handler Plugin plugged in.