Gremlin: From Chaos to Reliability Management

An Intellyx Brain Candy Update

When we last spoke with Gremlin in April of 2021 they had already started the journey from simple fault injection (aka chaos engineering) to a more comprehensive reliability management solution.

SInce then they’ve added the capability to detect reliability risks, assess an overall reliability posture, and address risks with automation. The goal is to identify and resolve hidden reliability risks before they cause outages. 

For Gremlin now, it’s all about providing the tools to change the reliability culture from reactive to proactive. Instead of just injecting a fault and seeing what happens, they want to help customers check in advance for reliability issues that might cause an outage and resolve them before they do.  

Reliability is not about testing once and fixing something, they said, it’s about creating a continuous reliability practice across the IT deployment landscape, which for some organizations may be large and complex.   

Gremlin integrates with observability and monitoring platforms, including a tight integration with DataDog, to provide information about the results of reliability tests. 

In the beginning, they said, their tooling was about “you find it, you fix it.” Then it became “we find it, you fix it.” And now they are progressing toward “We find it, we fix it.” 

Copyright © Intellyx LLC. Intellyx is an industry analysis and advisory firm focused on enterprise digital transformation. Covering every angle of enterprise IT from mainframes to artificial intelligence, our broad focus across technologies allows business executives and IT professionals to connect the dots among disruptive trends. None of the organizations mentioned in this article is an Intellyx customer. No AI was used to produce this article. To be considered for a Brain Candy article, email us at pr@intellyx.com.

 

SHARE THIS: