The boring things that keep production apps reliable
Over the years, Freshbooks has grown from a monolithic PHP application to a complex ecosystem of numerous Pylons and Sparkplug apps that support the main app our users interact with daily. As the FreshBooks application, “FreshApp”, matures, so do our development practices. Here’s a small list of some of the simple, yet, crucial things we developers do to make FreshBooks more manageable and reliable.
FreshApp talks to several internal and external services. Services misbehave, and we can not allow misbehaving services to bring down FreshBooks. This is why we’ve implemented kill switches throughout FreshApp. The switches, activated by changing configuration settings, either cut off all communication between FreshApp and the misbehaving service or prevent access to the page that requires the external service.
Syslog, not optional
All new programs must support logging that is configurable by the operations team. If the operations team can’t modify the logging configuration without modifying the source code, then the DEV team won’t allow you to deploy your app. Being able to log to syslog isn’t very helpful if the app doesn’t log anything useful, of course. So we consider carefully what sort of data to log. For example, our Pylons apps log how long each request takes to process, which incidentally makes it easier to trace bottlenecks, as well as a correlation id that allows us to trace requests to the Pylons application back to FreshApp.
Automate repetitive tasks
Deployments to staging environments are repetitive and error prone; we’ve automated our release candidate deployments using Fabric. Our scripts check out the code, tag the build in Git, build the egg and deploy it to our various staging environments. After the egg is deployed to the staging environment the same fabric script runs a quick smoke test to verify that the app was deployed correctly. Speaking of which…
Smoke tests and diagnostics
Smoke tests are great, but they need to provide meaningful results. A diagnostic page providing metrics on application health makes an easy way to see at a glance the output of a smoke test, as well as providing you with a consistent place to check up on application performance.
You can automate deployments, testing and lots of boring things, but you still need to remember to make a build, you need to understand the organizational impact of your deployment and that’s why checklists are immensely powerful. There are people who need to be informed, marketing or business decisions that need to get made, all sorts of things. Even though we can memorize all the steps, we still walk through the list, ask ourselves the hard questions, and cross off each step as we complete it.
As simple as they seem, these practices help keep FreshBooks running smoothly. What are some of the things you do to keep your app running in production?