How Hydrolix Cracked the Edge Services Observability Problem

October 8, 2024

BrainBlog for Hydrolix by Jason Bloomberg

In the previous article in this series, Intellyx Principal Analyst Jason English laid out the challenges organizations face with edge services observability.

As organizations ramp up their use of Content Delivery Networks (CDNs) on the edge, the quantity and diversity of observability data – log data in particular – explode.

As English points out, edge observability data differs from other log data in fundamental ways, requiring a novel approach to dealing with the problem – a challenge that Hydrolix has risen to.

What are the ingredients in Hydrolix’s special sauce that enables it to deal with edge observability data better than its competition?

Secret sauce ingredient #1: Optimizing for Log Data

At its core, Hydrolix is a data lake – one of many on the market. But it’s no ordinary data lake.

The creators of the Hydrolix data lake designed it for use cases like multiple-CDN observability, as it can ingest several logically grouped data sources into a single table.

Such groupings could be logs from different CDNs or perhaps microservices within a business domain.

Secret sauce ingredient #2: Better indexing and data compression

Hydrolix’s indices take advantage of the fact that log data always has timestamps. By partitioning data by time, Hydrolix simplifies indexing, thus improving the speed of queries.

Hydrolix also indexes all columns (i.e., fields) in each table and then compresses the indices as well. The result is high compression with minimal overhead.

Better compression means that indices take less space, which in turn makes it practical to index everything. As a result, Hydrolix’s compressed indices are able to index all columns by default, ideal for finding specific information in large data sets.

Hydrolix’s patented compression algorithms empower the data late to retrieve data from S3-compatible object storage with minimal latency, maximizing I/O throughput when pulling data segments from object storage.

The combination of indexing all columns and well-differentiated compression supports real-time analytics and massive horizontal scale.

Secret sauce ingredient #3: Predicate pushdown

Predicate pushdown is the process of retrieving the smallest amount of data necessary from the database by filtering data close to its source rather than transferring it first.

Instead of fetching data via large numbers of concurrent HTTP requests, Hydrolix minimizes the number of requests, improving performance and reducing network overhead.

In addition, predicate pushdown enables queries to retrieve individual column segments (say for a particular time interval), which is far more efficient than retrieving entire columns.

Predicate pushdown is only possible because Hydrolix indexes everything – another important advantage of the company’s indexing strategy.

Secret sauce ingredient #4: Optimizing for high cardinality

High cardinality has been the bane of traditional database technologies for decades.

When the number of possible values for a particular field is small, indexing and thus querying the database is straightforward and fast.

When there are many possible values (in other words, high cardinality), then indexing and querying can take a lot longer.

Every log file has high cardinality, because there are so many ways each entry can differ from all the others. If an organization wants to query multiple logs from multiple systems – the primary challenge of edge services observability – then cardinality explodes off the charts.

Hydrolix addressed this problem by optimizing the platform for high cardinality data. As a result, there is no effective limit on cardinality, making even petabytes of log data queryable in real-time.

Secret sauce ingredient #5: Merge service for late and out-of-order data

Edge services observability depends upon the ability to combine various logs whose timestamps may not precisely align, perhaps because of differing latency or other sources of delay.

Hydrolix deals with this problem via a merge service that sorts late arriving and out of order data upon arrival at the data lake, regardless of how late or out of order a particular data feed is.

In fact, Hydrolix automatically sorts and reorders incoming data at the merge step, making data available in real-time even as it comes from multiple sources.

Secret sauce ingredient #6: Independently scalable query pools

Hydrolix leverages its cloud native architecture as well as the open-source, real-time data warehouse ClickHouse to independently scale each subsystem – including ingestion, merge services, as well as queries.

By taking advantage of the separation of compute from storage that Kubernetes provides, Hydrolix can independently scale query pools – the dedicated quantity of compute resources that support all the queries in some given interval of time.

In Hydrolix, such query pools scale independently from the underlying storage. It is thus possible to scale workload-specific pools to address particular business use cases – both up and down as needed.

Secret sauce ingredient #7: All hot, all the time

Because cloud costs can explode when storing massive data sets like log files, organizations traditionally segment their storage into tiers. This ‘hot,’ ‘warm’, and ‘cold’ approach allocates data to storage infrastructure with different speeds at different costs.

The problem with such tiering is that querying warm data is slower than hot, and cold data slower still. Querying warm and cold data also typically requires moving it to the hot tier, making such queries more expensive.

Hydrolix has turned this tiering on its head. By combining Hydrolix’s patented compression, optimization for high cardinality, better indexing, and the scalability of cloud native infrastructure, Hydrolix gives its customers the luxury of all hot data, all the time.

The result: all queries against log data are fast, whether the request is for today’s data or last year’s – and the cost of storage is a fraction of alternatives.

The Intellyx Take

The data infrastructure world has always been one of tradeoffs. Speed, cost, and data volume have always been at odds.

When a vendor like Hydrolix comes along and says that they can deliver high-performance queries against massive quantities of log data while also saving money, such promises might strain credulity.

It might seem that Hydrolix has waved a magic wand, but once you break down the company’s innovations, it becomes clearer how it managed to achieve the benefits it’s promising.

No single innovation, however, accounts for Hydrolix’s success. All seven ingredients are necessary to make up the secret sauce that enables Hydrolix to live up to its promises.

Jason Bloomberg

Imply: High Performance, Real-Time Analytics Database with ‘Sessionization’

Level AI: Empowering Service Teams with AI to Deliver Better Customer Experience