Infrastructure: Why We Love Cattle And Think Like A Rancher

Dog

Okay, okay — we don’t actually have it in for Fido here at Mobomo. In fact, we’re quite the puppy- and kitty-loving group. But when it comes to building infrastructure to support your web or mobile application in today’s cloud-based environment, it has never been more important to forget everything you ever knew about caring for pets, and instead, start thinking of yourself as a cattle rancher.

Pets, in a word, are unique. Aren’t yours? They’re cuddly, quirky, and require lots of tender loving care. And while that’s delightful if your pet is a Basset hound or a Maine Coon, it’s generally not the best model to follow if your pet is a server.

In the old, dark days before virtualization and cloud services, all servers were pets — physical boxes screwed onto a rack in a data center (or worse, sitting underneath Larry’s desk in the office server closet; careful with that Big Gulp, Larry!). Each one was named, crafted by hand, and given copious attention by a neck-bearded SysAdmin who came to love his servers as much or more than his children. Pets are indispensable and irreplaceable, and when they get sick, it’s a BIG deal. After all, what self-respecting pet-parent wouldn’t drop everything to nurse his/her little furbaby back to health?

But in the rough-and-tumble world of application development, that’s a big problem. Time spent troubleshooting machine-specific problems is a deadweight loss for every project’s bottom line. To say nothing of that desperate feeling when the CEO’s pet project is suddenly unresponsive, and Larry’s the only one who knows anything about the machine it’s running on, and Larry’s not answering his pager…

Clinton2Cattle, on the other hand, aren’t like pets. They’re typically given numbers instead of names. They’re expendable and (with apologies to vegetarians everywhere) disposable. Rather than being cared for individually, they are completely managed by repeatable and documented processes. If there’s a problem with one cow, the herd is unaffected. The sick cow is, shall we say, simply removed from the herd and life goes on.

The advent of virtualization, cloud computing, and provisioning tools like Chef and Puppet have combined to allow technical architects to think of their resources as cattle instead of pets. This thought — that systems are not hand-crafted masterpieces but cogs in a machine managed by repeatable rules — is at the core of the DevOps philosophy. While there is a great deal of spirited debate over what DevOps precisely means, at Mobomo it boils down to three rules:

Infrastructure is Code

Larry’s not going to like this, but he’d be a lot better at his job if he thought more like a developer. All those manual processes he goes through to create and maintain infrastructure, all those one-liner shell scripts StackExchange hacks he’s committed to memory, mean he’s introduced a non-automated factor into the project’s critical path: himself.

But fortunately (or not) for Larry, machine provisioning tools like Ansible and Chef, along with cloud resource templating systems like CloudFormation and Terraform, mean that it is now possible to remove all manual interaction from the process and allow an application’s infrastructure to be defined by code, right beside the application source itself. This way it can be version-controlled, peer-reviewed, and easily tested in non-production environments, eliminating the incongruities that come with different sets of environmental variables. Beyond that, the code functions as de-facto documentation of your environment’s structure, which tends to be way more reliable than whatever is lurking in the nether regions of Larry’s memory.

Bottom line: if your entire application infrastructure can’t be rebuilt with the push of a button (or the running of a single shell script), then it’s not in code, and therefore your servers are pets, not cattle.

Infrastructure is immutable

Related to the above is the idea that infrastructure should be immutable — that is, never altered on the fly once it has been created. This rule prevents Larry from, say, noticing a bug and updating a configuration file on a running server, thus causing headaches the next time the application is deployed or has to scale-out. If all running resources are treated as inaccessible black boxes, this means code changes can only be made via the version-controlled provisioning scripts and templates, thus ensuring that each deployment or scale-out of the application will be running on identical servers.

Furthermore, treating infrastructure as immutable allows us to think about deployments themselves differently. Rather than deploying an application by pushing an update to run servers, crossing our fingers and hoping it works then desperately rushing to revert manually if a bug is discovered, at Mobomo we employ a Blue/Green deployment methodology.

When a production stack needs to be updated, it is deployed from scratch with new infrastructure (the “Green” instance) created from code each time. That allows QA testers (using a combination of automated and manual tests; more on the relationship between automated testing and Blue/Green deployments in a future post!) to verify the functionality of the new environment before it is made live. Then, production traffic is simply switched (by changing a DNS record or similar) onto the new/”Green” stack. If a problem is then detected with the newly deployed app, it is a simple matter to switch back to the old production “Blue” instance painlessly without the risk of extended production downtime. Once all stakeholders are satisfied, the old “Blue” stack can simply be destroyed.

Embrace the chaos monkey

Cattle-oriented infrastructure means coming to terms with the idea that failure is ubiquitous and constant, and rather than something to be feared, should be embraced. This means specifically designing your application infrastructure with the certainty that it will fail, and testing that failure constantly, in production.

We’ve recently been playing with a tool called Chaos Monkey created by the development team at Netflix. Chaos Monkey does one job and does it very well — it runs in your production environment and randomly kills running server instances.

Let that sink in for a moment: if your infrastructure cannot tolerate the random, arbitrary death of machines, then you are treating your servers like pets instead of cattle.

Using techniques like load balancing, auto-scaling, and high-availability proxies, it has never been easier to embrace the Chaos Monkey. Failure should be assumed and automated procedures put in place (spinning up new machines, altering DNS and load balancer configuration) to handle that failure without any human involvement. Furthermore, with cloud services like AWS and Azure continuing to build hosted solutions that take the guesswork out of planning for scalability, we find ourselves in a brave new world in which most undifferentiated heavy lifting has been eliminated and we can focus on what we do best: building great applications.

So by all means, adopt a pet or three and care for them like family. But when it comes to supporting your Internet applications, it’s long since time to start thinking like a rancher.