How to Operate Cloud Native Applications at Scale

May 22, 2023

Intellyx BrainBlog for Lightstep by Jason Bloomberg

The Importance of Cloud Native Observability

This is the final post in a series about the journey to cloud native, the origin of cloud native observability, and scaling up cloud-native deployments. It explores the challenges of operating cloud-native applications at scale and in many cases, massive, dynamic scale across geographies and hybrid environments.

Scalable, dynamic applications are the point of cloud-native infrastructure. As enterprises ramp up their cloud-native deployments, in particular, Kubernetes, they quickly jump from a small number of clusters to vast numbers of clusters.

They now require an architecture consisting of clusters of clusters (aka multiclusters), typically scattered across different flavors of Kubernetes in different clouds, as well as on-premises components that may incorporate legacy application assets.

Managing such global hybrid multicluster applications is a Herculean task. Today’s cloud management and observability tools address the challenges of managing modest Kubernetes deployments, but once an organization scales up, such tools struggle to provide the management capabilities those enterprises require.

What’s missing isn’t better tooling, although it’s true that many tools don’t scale well. Rather, the missing piece is more likely cloud-native operations best practices, practices for leveraging management and observability tools following the same cloud-native principles of scale and dynamic behavior as the infrastructure they are managing.

Understanding Cloud Native Operations at Scale

Cloud-native computing is a broad, comprehensive paradigm shift in the way enterprises and web scale companies build and run IT infrastructure. At the heart of this shift is Kubernetes, and at the heart of Kubernetes lies the platform’s approach to elasticity and scale.

Kubernetes’ architecture calls for microservices in containers, containers in pods, pods in clusters, and clusters in their own multiclusters – all running on nodes (generally cloud-based virtual machines), typically with multiple clusters per node.

Each step in this chain reflects a “many to one” relationship, where the Kubernetes infrastructure handles the “many” part – in particular, how it handles many containers per pod and pods per cluster.

Kubernetes – now including a vast ecosystem of open-source and commercial projects and products – handles this autoscaling rapidly and automatically, thus causing ephemeral microservices and other software components to appear and disappear in the blink of an eye.

Operators must manage this scalable and dynamic infrastructure and be on the lookout for potential problems that might impact the performance or availability of the deployed applications.

To accomplish this task, the various components of the cloud-native infrastructure must be observable, typically by generating logs, traces, and metrics following the maturing OpenTelemetry standard.

Operators also need sophisticated automation to have any hope of keeping up with the ever-changing landscape. Humans simply cannot turn the knobs and dials in the production environment quickly or accurately enough to deal with cloud native applications.

Read the entire BrainBlog here.