Rethinking Vertical Scalability for Cloud-Native Computing

December 1, 2021

The two flavors of IT infrastructure scalability – vertical and horizontal – are both vitally important, but the relationship between them is often misunderstood.

Vertical scalability refers to the capabilities of individual compute instances. Adding RAM and additional disk storage to a physical server is one example of scaling up. Similarly, allotting more memory and storage to a VM is another.

Horizontal scalability refers to the capabilities of a set of identical (or nearly identical) compute instances. Scaling out means adding more such instances, be they physical servers, VMs, or some other type of compute instance, and then dividing up traffic among them. Storage and network components can scale out the same way.

The relative importance of these two types of scalability has shifted over the last half-century as technologies have matured. Now that enterprises are implementing the new paradigm of cloud-native computing, the balance between these two priorities has shifted again.

The Historical Context for Cloud-Native Scalability

The host-based mainframe-centric world from the last century focused on vertical scalability, as all you had to do to scale up a mainframe was to license more capacity from IBM.

The rise of the Web in the 1990s brought horizontal scalability to the table, as the inevitable physical limit on vertical scalability required organizations to scale out their servers to meet the needs of the consumer Web.

Virtualization and cloud computing reinvented both vertical and horizontal scalability.

In the cloud, scaling up vertically is as simple as provisioning a new, higher capacity instance type. Correspondingly, scaling out horizontally in the cloud is a simple matter of setting each instance’s autoscaling parameters properly. The result is essentially infinite horizontal scale, limited only by the budget.

While cloud computing (public clouds in particular) offer many advantages over on-premises alternatives, cloud-based scalability can also be quite expensive. Upgrading instance types or setting up instances to autoscale at the drop of the hat can both run up the cloud bill dramatically.

The Challenge of Optimizing Kubernetes Scalability

Optimizing cloud environments, therefore, is an essential part of running a tight IT ship, and many products have come onto the market to help.

These cloud optimization technologies can help organizations select the most cost-effective instance types, the most efficient allocation of workloads within those instances, and the optimal autoscaling strategies for balancing the need to provision for traffic spikes while maintaining a lid on costs.

And yet, even though most Kubernetes deployments run in one cloud or another, these now-traditional approaches for optimizing cloud scalability are insufficient for a cloud-native world.

Scalability operations in the cloud are slow, on the order of minutes. Kubernetes autoscaling, in contrast, can take place in milliseconds at the pod level.

This horizontal pod autoscaling (HPA) is built into Kubernetes, automatically scaling pods up and down in order to deal with increases and decreases in traffic.

In other words, cloud-native computing and cloud computing handle horizontally scalability quite differently: the former at the pod level within Kubernetes, and the latter at the instance level within the cloud’s own environment configurations.

Kubernetes Vertical Scalability: Still a Challenge

It would follow, furthermore, that Kubernetes also handles vertical scalability differently from cloud computing.

Instead of provisioning instance types with greater capacities, vertical scalability in Kubernetes takes place at the pod level by changing the capacity parameters associated with each pod – CPU and memory being the most important. We call this type of autoscaling vertical pod autoscaling (VPA).

Kubernetes allows for manual or programmatic control of these parameters but doesn’t offer a way to automatically optimize VPA. Developers often choose fixed values for these parameters in order to avoid error alerts at the expense of optimization.

To fill this need, Opsani offers its Continuous Optimization Service.

At the heart of the Opsani service are tuning pods – special pods that Opsani adds to each cluster to monitor traffic going to that cluster. The tuning pods then use machine learning (ML) to determine the best values for CPU and memory (and in some cases, other parameters) in real-time for the pods in that cluster.

The tuning pods themselves are lightweight, adding minimal overhead to the operation of each Kubernetes cluster. They also work hand-in-hand with Kubernetes’ built-in HPA, which may provision or deprovision pods as necessary to deal with traffic variability.

Every time HPA operates, it redistributes the container workloads in the pods it is managing, thus changing the compute and memory requirements for each pod. Opsani takes this variability into account automatically, essentially normalizing for horizontal scalability in order to provide the optimal vertical scalability.

Opsani’s ML models also take into account the shifting nature of traffic patterns, and hence of compute and memory requirements for each pod, by expiring out-of-date historical data. This ‘smart expiration’ avoids feeding poor data to the ML models while retaining selected baseline data necessary for those models to succeed with their optimizations.

Opsani typically runs in the production environment, making compute and memory optimization recommendations that can either automatically reconfigure the appropriate Kubernetes instance, or open Jira (or other) tickets to instruct the ops team to make those changes manually.

Opsani can also run as a development tool as part of the test cycle, helping developers build VPA directly into their microservices code. In this situation, the team uses GitOps to check the optimal configurations into the code repository. The team thus shifts VPA to the left, building optimization directly into the infrastructure-as-code code base.

The Intellyx Take

The Opsani story is partly about optimization – but it’s also about automation. As Kubernetes deployments grow to hundreds and thousands of clusters and beyond, automation will become increasingly important. There will simply be no way for manual management and tuning to keep up.

Opsani understands this future. Tweaking a couple of parameters for a handful of pods manually is no big deal – but that’s not where Opsani plays.

As cloud-native deployments grow, the optimization and automation that Opsani offers will become increasingly important, both for managing scalability, and perhaps most significantly, its associated costs.