Chasing the Database Holy Grail: The Converged Model

It seems like an age-old question: can you use the same database for transaction processing and analytics? 

The answer has always been “no” or “generally not” since the requirements of each are so different. Transaction processing typically requires a fast (“real time” or “online”) response without losing any data. Transactions carefully record commercial transactions, which have business value. 

Analytics on the other hand identify trends, or insights from a larger set of data. Analytics that inform business decisions. They evaluate data over time – often transactional data, or course – to check for anomalies or unusual patterns. You can lose some data and still produce significant results on the average, or mean, for example. 

Recent trends make me think this and other convergences may be more and more likely. 

No SQL and the Distributed SQL Trend

A few years ago, ok, maybe 20 years ago by now, it seemed like the relational database became the most popular general-purpose database on the market. Everyone used a relational database as the way to store and retrieve application data.

But then cloud scale infrastructure was developed and the relational model did not meet many cloud use cases – mainly for Web based applications (as opposed to internal enterprise applications serving one company). NoSQL databases were developed to meet the scalability and latency requirements of those Web applications, and to adapt to the differences in cloud infrastructure vs traditional enterprise infrastructure. 

In contrast to “SQL” databases (more correctly relational databases), the NoSQL name did not reference a standard. Nor were the various NoSQL databases similar. One had to understand the use case(s) around which they were developed to decide whether or not it was a fit. 

MongoDB and CouchDB were developed to handle unstructured documents such as HTML and XML, and now JSON of course, using the name/value pair model, which produced better results than mapping unstructured data into and out of an SQL schema. Especially for high volume/low latency use cases such as online ad0 bidding. 

Cassandra was originally developed by Facebook to upload photos as objects. Again, not as easy to do with a relational DB. Although Facebook later moved away from Cassandra, it has been adopted by Netflix and Apple, among others, as a scalable transactional object store. 

NoSQL databases met the scalability requirements of big Web apps, and together with caching and “eventual consistency” capabilities also addressed the critical latency requirement for Web sites (i.e. if your web site isn’t responding quickly enough, I’ll try another one). But it did so largely by eliminating features, such as schemas, referential integrity, joins, and transactional consistency, to name a few.

Arguments in favor of NoSQL often made the point that it was easy to get up and running with NoSQL databases very quickly. But the missing features and capabilities compared to the traditional relational database meant that more responsibility was put back into the application space for ensuring consistency, transforming data structures, or joining records. 

Complexity wasn’t really eliminated; it was just pushed into the app space. And this caused a lot of headaches for app developers, especially in the early days of Mongo. 

Over time the NoSQL databases started adding many of these features and capabilities, as customers started asking for them. They added SQL as a query language (though often not as a schema language), added transactions, and figured out joins, indexes, and transformations. 

And now we have the Distributed SQL trend, which claims to solve the scalability problem for relational databases. 

Leading the pack is Google Spanner, whose team reportedly decided it was just easier to implement transactional consistency than handle the workarounds of “eventual consistency.” Why not have scale and consistency if both were possible?

Now Cockroach DB, Maria DB Xpand, and Yugabyte are following in the Spanner footsteps and offering commercial, multi-cloud relational databases that deliver both scalability and consistency, based on the SQL/relational model. 

Real Time Analytics 

And what about the real time analytics trend? At one point special purpose databases were used for collecting and storing time series data, and real time data such as satellite feeds or stock ticker updates. 

MongoDB recently announced support for real time data streaming, joining a long list of other database products with offerings in this area. 

It makes sense, since there is so much real time data now from IoT devices such as automobile telemetry, factory automation systems, and interactive AI to process. Everybody naturally wants a piece of this market, in which the volumes of data are growing exponentially. 

Recent announcements from TimeScale and Imply (Druid) are starting to sound very similar, despite the different origins and architectures of these data stores, as both jockey for position in the real time analytics market. 

It seems like everyone feels they have to get into the real time analytics game, or risk getting left behind. 

Data Lake Houses

What about traditional analytics and the use of dedicated data stores? Many different databases are being offered for constructing “data lakes” and “data lake houses” for large scale analytics, such as Snowflake and Databricks. A quick scan shows as many contenders for the data lake business as there are for real time analytics. 

What happens after you feed all your data into the lake and the lake becomes the system of record? Do you update the lake itself or all the DB streams that feed the lake and wait for the updates to propagate to the data lake? Or do you update the data lake and then propagate the update to the feeder DBs?

I’m not sure anyone really knows. The old adages are breaking down. Maybe you can have the same database handle transactions and analytics. Maybe you can have the same database handle scalability and low latency with transactional consistency. Maybe you can have real time IoT feedback loops using the same database as well.  

Seems like every day a new trend pops up – distributed SQL, vector databases, real time analytics databases – even a proposal for a database operating system. 

Database Evolution 

Part of this is just what happens when you have a successful product in one area, or that meets a popular use case, and you start attacking adjacent markets, such as MongoDB’s successful document oriented DB getting into analytics. Customers start asking you for additional features and capabilities because it’s easier to manage one database than two. 

Part of it is just the evolution of the industry and the specialization of data stores for different types of applications, such as vector DBs for AI. And people figuring out how to stretch database models to take on more use cases. 

Some of it is due to the different demands of Web applications vs internal enterprise applications, and the need to handle unstructured data in a good way. But then you can end up with several databases for different use cases. 

Netflix has one database for search, another for a transactional object store, and another for analytics. Facebook uses multiple different databases for different purposes, which is not uncommon. But wouldn’t you want to combine them if you could?

Several database vendors are promoting various types of converged databases, which generally means they promote them for multiple use cases, which previously might have required a special purpose database. 

Some Database History 

When I was working at Digital Equipment Corp in database engineering, in the mid 1980s, our flagship product was DBMS, which was a network database (basically the indexes and pointers were fixed and therefore fast, but to change them you had to unload and reload all your data). 

A short time after I joined, we also offered a relational model database, Rdb (which Oracle eventually bought, along with DBMS and several other ancillary products such as a data dictionary and data query language).

At the time the recommendation was to use DBMS for transaction processing and Rdb for decision support. Eventually relational databases got fast enough to support transaction processing, and older database models such as hierarchical and network became legacy. 

Obviously this all changed over time, as we are seeing current trends toward convergence as database products expand into adjacent market areas and chase requirements for new types of applications. Not to mention getting new business by eating someone else’s lunch, so to speak.  

The Intellyx Take

Many recent innovations in the database market seem to be leading us back to the “one database that rules them all” as we had for many years with the relational model.

Edge cases, special purpose databases, and unique use cases will always exist. For example, vector DBs and conversational AI data stores seem to be charting their own independent path right now. 

A specialized solution typically is better than a generalized solution for a given use case, but often enough the generalized solution is good enough, and less expensive. 

Several other convergences seem to be happening, however. NoSQL and SQL appear to be reconciling their differences, time scale and real time analytics are becoming good friends, and data lake (houses) and transaction processing are on a common path toward establishing a single system of record. 

I suppose the main driver in the end will be customers who ask for additional features and functionality in the databases they already have, and the natural tendency to reduce the number and (especially) type of databases to staff up and manage. 

Copyright Intellyx LLC. Intellyx is an industry analysis and advisory firm focused on enterprise digital transformation. Covering every angle of enterprise IT from mainframes to artificial intelligence, our broad focus across technologies allows business executives and IT professionals to connect the dots among disruptive trends. As of the time of writing, none of the organizations mentioned in this article is an Intellyx customer. No AI was used to write this article. Image credit: Nicolas Raymond

 

SHARE THIS: