First, Netflix introduced the Chaos Monkey: an app that would kill processes at random within the Netflix production environment in order to harden its resilience.
The Chaos Monkey led to a Chaos Army of similar apps, and then onto the practice of chaos engineering, taking the hard-fought lessons of resilience at scale to other web scale companies and enterprises alike.
While chaos engineering may prove to be an essential tool for ensuring resilience, the idea of breaking things in production on purpose sends a chill up the spine of any IT manager.
Gremlin seeks to address these concerns with its Failure-as-a-Service offering. Gremlin allows operations managers to run chaos engineering-based apps as controlled experiments in order to find weaknesses in modern scaled-out deployments – without the cold sweats.
Copyright © Intellyx LLC. Intellyx publishes the Agile Digital Transformation Roadmap poster, advises companies on their digital transformation initiatives, and helps vendors communicate their agility stories. As of the time of writing, none of the organizations mentioned in this article are Intellyx customers. To be considered for a Brain Candy article, email us at firstname.lastname@example.org.