Clusterpoint: Document-Centric, ACID-Compliant DBaaS

With all the NoSQL database options available on the market today, including cloud-based Database-as-a-Service (DBaaS) options as well as installable software, you might wonder whether we need yet another DBaaS player – especially one with headquarters in as unlikely a place as Latvia.

clusterpointWell, get ready for Clusterpoint to surprise you. Clusterpoint offers a document-oriented DBaaS service that supports both JavaScript and XML. Fully ACID compliant. Instant scalability. And they have a home-built indexing engine that delivers blisteringly fast queries, even for complex, deeply structured JSON or XML documents.

Latvia headquarters notwithstanding, their core team has serious tech cred, with years of experience working on the Google search engine. They bring that experience to the seriously difficult challenge of building a cloud-ready database engine that has the chops to compete with Amazon DynamoDB, MongoDB, and Cassandra, among others in an increasingly crowded field.

In fact, you can think of Clusterpoint as offering the best of each of these three offerings. The cloud-centric usability and pay-as-you-go business model of DynamoDB. The JavaScript support of MongoDB. And the broadly distributed, rapid scalability and transactionality of Cassandra.

Bespoke Indexing: An Important Clusterpoint Differentiator

With mature open source indexing and search technologies on the market like Apache Lucene and its Apache Solr search server, you might wonder why Clusterpoint put the time and effort into building their own indexing engine. The answer lies in how they deal with hierarchical documents.

Document-oriented databases traditionally treat JSON and XML as document file types – essentially, long sequences of characters that might as well be plain text files. Indexing them, therefore, tends to be a brute force process that treats them as any other document format.

Clusterpoint’s indexing takes a smarter approach. It treats these documents as the hierarchical documents they are. Not only does this smart indexing deal well with large, deeply structured JSON and XML, it also enables the logical joining of such documents.

With earlier generation document databases, searching over multiple documents requires a concatenation of such documents, slowing down the searches and running into other capacity issues as the number and size of such documents grows. Clusterpoint, on the other hand, can treat a set of JSON files as a single indexed unit, even when one document appears as a node deep within another document.

Clusterpoint’s indexing scheme also supports more sophisticated querying of JSON and XML documents. In addition to a native query language, the latest version of Clusterpoint supports JS/SQL, a version of SQL specifically for querying such documents.

The best way to get a sense as to the power and ease-of-use of JS/SQL is via some examples. For example, with JS/SQL we can put JavaScript directly into SELECT/WHERE/ORDER BY clauses:

function discount(category) {
      discounts = { "Books" : 0.1, "Clothing" : 0.3, "Home & Garden" : 0.05, "Sports" : 0.15 }
      if (category in discounts) 
            return discounts[category]
      }
      return 0;
}

SELECT name, price * (1.0 - discount(category)) AS discounted_price FROM product

Developers can also place arbitrary JavaScript in GROUP BY clauses as well:

function PriceBucket(price) {
      var boundaries = [0, 1, 5, 10, 50, 100, 200, 500, 1000];
      for (int i = 1; i < boundaries.length; i++) {
            if (price >= boundaries[i - 1] && price < boundaries[i]) {
                  return boundaries[i - 1].toString() + " to " + boundaries[i].toString();
            }
      return "above " + boundaries[boundaries.length - 1].toString();
}

SELECT COUNT() FROM product
WHERE CONTAINS(“mountain bike“)
GROUP BY PriceBucket(price);

The JavaScript above is straightforward for anyone who uses the language, and the SQL will be familiar to anyone with experience with traditional SQL.

Comparing Clusterpoint to the Alternatives

Clusterpoint’s inherently distributed architecture provides instant (sub-second) scalability, similar perhaps to Cassandra clusters, but far faster than DynamoDB, hosted MongoDB, or other choices like IBM Cloudant or Microsoft DocumentDB.

Its scalable ACID transactions are unusual in the world of NoSQL. Vendors like NuoDB offer ACID transactionality, and Cassandra offers tunable consistency. But DynamoDB, MongoDB, and most other NoSQL choices offer an eventual consistency-based transactional model.

The CAP theorem, of course, states that no database management system can offer immediate consistency, partition tolerance, and high availability at the same time. In practice this theorem means that if the nodes in a distributed database can’t communicate with each other, they can’t commit a transaction until such communication resumes.

With Cassandra’s tunable consistency, the admin can configure whether to allow commits when only some of the nodes are in communication with each other. A common setting is to allow commits with one more than half the nodes participating, known as quorum-based consistency.

Clusterpoint takes a quorum-based approach, were most of the nodes must be able to communicate. But since Clusterpoint controls their own deployment environment (running in their own collocated data centers rather than AWS), the chances that such a quorum won’t be achievable is minuscule.

Clusterpoint’s support for XML as well as JSON differentiate it from most of the document databases on the market, and its sophisticated indexing and search capability enable full-text search, relevancy ranking, as well as 2D and 3D geospatial indexing and querying. MongoDB and Cloudant offer built-in indexing and search, while other options require a separate search add-on. Only Cloudant offers geospatial support as advanced as Clusterpoint.

The Intellyx Take

What’s it going to take for current DBaaS and NoSQL users to make the switch to Clusterpoint? The first step, of course, is trying it out. Good thing Clusterpoint gives away the first 10 gigabytes of storage for free (for as long as you want), and the first 30 days are free, regardless of how much you use.

And while DBaaS is clearly the wave of the future, Clusterpoint also offers an on-premise version as well, with no license fee and a simple install. So what are you waiting for?

In the final analysis, Clusterpoint does have visibility and credibility roadblocks to overcome. Both the DBaaS and NoSQL markets are already crowded, and rapidly maturing. Blog posts like this one and free-to-try levels can only go so far. The real test will be whether early customers fall in love with Clusterpoint and spread the word.

Intellyx advises companies on their digital transformation initiatives and helps vendors communicate their agility stories. Clusterpoint is an Intellyx client. Intellyx retains full editorial control over the content of this article. Sample code source: Clusterpoint.

SHARE THIS:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.