Data Gravity in a Real-Time World

BrainBlog for Fiorano by Jason Bloomberg

In the first article in this series, I laid out the application and data infrastructure requirements necessary for enterprises to remain competitive in this modern, real-time world. The key is to build global, hybrid, multi-cloud applications (GHMAs) that are both real-time and distributed.

In the follow-up article, my colleague Eric Newcomer provided a missing piece of the GHMA puzzle: event-driven microservices, which provide scalability and reliability for complex application flows.

For IT managers and architects in charge of implementing such applications, however, there are several additional considerations they must take into account, including security, compliance, data sovereignty, and data gravity.

I’ll take a deeper dive into data sovereignty in a future article. In this piece, however, let’s take a closer look at data gravity. What is it, why is it such a significant challenge, and how can GHMAs resolve the issues surrounding it?

The Evolution of Data Gravity

Data gravity refers to the principle that the larger your data sets become, the more expensive and difficult they are to move. Essentially, data have inertia that increases with size.

Data gravity became a significant concern in the big data heyday of a few years ago, as technologies like Hadoop became popular for dealing with data sets that had previously been too large to handle effectively with the current tooling of the day.

Hadoop and its ilk soon fell by the wayside, however, as their batch-centric architecture was simply too slow to address the needs of modern businesses. What the world needed was a real-time approach for dealing with massive data sets.

Even with real-time infrastructure like Apache Kafka and other streaming technologies, data gravity still posed a significant challenge. Given enterprises sought to leverage the cloud, the costs of moving data in and out of cloud environments – known as ingress and egress – added up quickly.

If moving data to compute resources in the cloud proved too expensive, the obvious solution was to move compute closer to the data. Given the maturity of virtualization technology, moving compute is straightforward – and far less expensive than moving big data sets around the world.

Just one problem: solving the data gravity challenge by moving compute was easier said than done, because the data might reside in different locations. Sometimes data are in the cloud, but in other situations, they are on premises (even in mainframes) or on the edge.

Implementing a global real-time infrastructure soon became important for organizations with such distributed data gravity challenges. But then once again, disruption upended everything with the sudden ascent of AI.

Artificial Intelligence Changes the Data Gravity Game

For all its hype, AI provides little more than the ability to extract insights and other patterns from large data sets – sometimes, very large data sets.

AIOps tools use machine learning to extract anomaly data from large quantities of log data. Deep learning tools train on massive data sets of representative information to identify, say, suspicious behavior in a surveillance video feed. And generative AI applications assemble natural language constructs based upon large language models (LLMs) that train on massive language-based data sets.

The common thread across all these types of AI are the quantities of data necessary to train the models.

Furthermore, those data may reside in different locations. Log data come from operational systems scattered across a hybrid operational environment. Video feeds may come from cameras at numerous locations on the edge. Generative AI may leverage data across the web – or organizations may feed their diverse corporate data into the LLMs.

All these situations have distributed data gravity problems, as the data sets are massive and scattered around an enterprise’s operational environment or even the world. Suddenly, moving the compute closer to the data presents an unsurmountable challenge.

How, then, should organizations deal with the modern complexities of data gravity?

GHMAs Solve the Distributed Data Gravity Challenge

Many organizations today find themselves with large data sets scattered around the cloud, on premises, as well as at various edge locations. Their compute capabilities may be scattered as well, but moving compute to the data when the data are located at diverse locations around the globe is difficult and is unlikely to be cost-effective.

For such organizations, GHMAs are the solution. A GHMA is itself a distributed application, with compute nodes in different locations as business needs require.

Such applications require underlying infrastructure that supports the global needs of distributed data. Fiorano provides such infrastructure.

Fiorano’s architecture consists of a distributed set of cloud native peer servers that communicate with each other in real-time behind the scenes, giving GHMAs a low-latency, event-driven messaging capability that effectively brings compute resources close to large data sets, wherever they may be.

With Fiorano in place, organizations can rest assured that they can leave their data sets where they should be – either for data gravity reasons or because of regulatory concerns (which we’ll cover in a future article).

The Intellyx Take

Fiorano essentially provides cloud native integration capabilities that enable organizations to build GHMAs.

Cloud native integration differs from previous generations of integration technology because it integrates abstracted endpoints – perhaps belonging to ephemeral microservices running on Kubernetes, or edge-based endpoints with specific hardware constraints, or even legacy endpoints running in on premises environments.

Traditional integration simply isn’t up to the task of integrating such dynamic endpoints, which is why GHMAs were previously so difficult to implement. With Fiorano, in contrast, organizations can leverage cloud native integration infrastructure that puts GHMAs within reach, even in distributed data gravity situations.

Copyright © Intellyx LLC. Fiorano is an Intellyx customer. Intellyx retains final editorial control of this article. No AI was used to write this article. Image credit: NASA Goddard Space Flight Center, CC by 2.0 Deed.

SHARE THIS: