The quickening race to lead cloud-native computing ‘observability’

October 1, 2019

As enterprises seek to achieve the scalability and flexibility benefits of cloud-native computing using technologies such as microservices, containers and serverless computing, they quickly run into a wall: How do they ensure the resulting infrastructure and the apps running on it are performing properly?

Traditional monitoring technologies and approaches are simply not up to the task of providing sufficient visibility and control into inherently dynamic, ephemeral software assets. How can you monitor something that may appear one second, scale out the next and disappear seconds later?

The answer: Operations technology must move beyond visibility to a new principle of “observability” that shifts the responsibility for ensuring the performance of cloud-native infrastructure to the components of that infrastructure.

The birth of the cloud-native observability platform

This rise of observability as a core principle of cloud-native computing is far more than a terminology update. It has driven numerous open-source communities to create suitable instrumentation for ensuring cloud-native tech is observable.

The information technology operations vendor community has correspondingly scrambled to put together what they are now calling a cloud-native observability platform.

Market consolidation is the clearest harbinger of this scramble. In just the last couple of years, Splunk Inc. has acquired SignalFx, DataDog Inc. has picked up Logmatic.io, VMware Inc. acquired WaveFront, New Relic Inc. assembled acquisitions Opsmatic, CoScale and SignifAI, and SolarWinds Inc. has put together TraceView and Librato with its AppOptics product and integrated it with Loggly and Papertrail.

Meanwhile, the open-source community has been busy as well. “Cloud Native Computing Foundation projects like Prometheus for time-series metrics, Fluentd for log analysis and Jaeger for distributed tracing are popular open-source frameworks for cloud-native observability,” Deepak Jannu, director of product marketing at OpsRamp, explained for The New Stack.

The goal of all of these efforts is to provide the four pillars of observability: logging, metrics, tracing and alerting.

“If monitoring is about watching the state of the system over time, then observability is more broadly about gaining insight into why a system behaves in a certain way,” wrote Container Solutions Managing Director Ian Crosby, engineer Maarten Hoogendoorn and senior engineer Thijs Schnitger, along with Kogusenn and former Container Solutions Chief Engineer Etienne Tremel, in a white paper for The New Stack. “The cloud-native monitoring environment must provide insight into how a service’s state is related to the state of other resources. This, in turn, must point to the overall state of the system.”

Observability brings a new context to operations in cloud-native environments. “A cloud-native application is composed of independent microservices and required backing services. Even though a cloud-native application as a whole must remain available and continue to function, individual service instances will start or stop as to adjust for capacity requirements or to recover from failure,” explained an IBM Cloud Docs article. “Monitoring this fluid system requires each participant to be observable. Each entity must produce appropriate data to support automated problem detection and alerting, manual debugging when necessary, and analysis of system health (historical trends and analytics).”

To read the entire article, please click https://siliconangle.com/2019/09/30/quickening-race-lead-cloud-native-computing-observability/.

(Disclosure: IBM and New Relic are Intellyx customers, and OpsRamp and VMware are former Intellyx customers. None of the other organizations mentioned in this article is an Intellyx customer. New Relic covered my expenses at FutureStack, a common industry practice.)