DataOps: What, Why, and How?

Buzzword alert: ‘DataOps’ is now a thing.

Billed as ‘DevOps for data,’ DataOps burst on the scene in 2014 with InformationWeek contributing editor Lenny Liebmann’s article 3 reasons why DataOps is essential for big data success on the IBM Big Data & Analytics Hub.

Liebmann essentially argued that Big Data presented similar, but different challenges from the application development and deployment pitfalls that gave rise to DevOps.

He broke down three core motivations for identifying DataOps as something new and different: business requirements for ever-increasing speed, the shortcomings of the cloud for addressing Big Data challenges, and the sheer diversity of Big Data workloads.

In the five years since his article, analyst firms and vendors have weighed in on DataOps: what it is, why you need it, and why it’s different from gear you already have.

Let’s take a closer look.

Clearing Up DataOps Confusion

Wikipedia defines DataOps first and foremost as a methodology:

DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics.

Another frequently quoted definition is from Jack Vaughan at TechTarget:

DataOps (data operations) is an Agile approach to designing, implementing and maintaining a distributed data architecture that will support a wide range of open source tools and frameworks in production.

Vaughan’s definition as a full lifecycle ‘approach,’ therefore, is broader and less formal than Wikipedia’s, and centers on the use of tooling.

We’re not done yet. Gartner’s Nick Heudecker defines DataOps as:

A collaborative data manager practice, really focused on improving communication, integration, and automation of data flow between managers and consumers of data within an organization.

Matt Aslett at The 451 chimes in as well:

DataOps is the alignment of people, process and technology to enable more agile and automated approaches to data management. It aims to provide easier access to data to meet the demands of various stakeholders who are part of the data supply chain (developers, data scientists, business analysts, DevOps professionals, etc.) in support of a broad range of use cases.

In contrast to a methodology or tool-centric approach, therefore, Gartner considers DataOps to be a ‘management practice,’ while The 451 settles for an ‘alignment of people, process and technology’ – which we might consider either a management practice or a methodology.

The Right Questions to Ask

It’s not the place of this article to pick a particular definition, or come up with yet another one. Instead, let’s discuss three core questions about DataOps that will clarify how you should think about it, and help you make appropriate decisions about DataOps in your organization.

Question One: Do we need DataOps to be something separate and different from existing operational and management practices and methodologies?

Data have always been central to the purpose and operation of IT, of course – so an obvious question is whether we really need something different from how we’ve been handling data in the pre-DataOps world.

For some organizations with small data needs, DataOps would certainly be overkill. However, most enterprises have disruptive Big Data initiatives with sufficiently large data sets to warrant specialized approaches and tooling.

That being said, creating a siloed DataOps organizational structure may be counterproductive. After all, one of the fundamental goals of DevOps is to break down organizational silos – so we wouldn’t want a DataOps to torpedo such an effort (I’ll be exploring the relationship between DataOps and DevOps in a future BrainBlog post).

Question Two: What is the salient difference between data-centric enterprise challenges and other enterprise IT challenges, and do different data-centric challenges require different approaches to DataOps?

Most IT challenges are by their nature technology-centric. Even application development, for better or worse, has long centered on the creation, testing, and deployment of code.

However, with Big Data, the individuals responsible for driving the technical efforts – data scientists and data analysts in particular – are essentially subject matter experts. True, these professionals may spend some or most of their time coding, as their roles tend to be quite technical, but the coding isn’t the point of what they do. How to extract value from data to meet business needs is the data professional’s remit.

DataOps must therefore support the organization’s specific data needs – even though such requirements may vary dramatically from one organization to another. For example, seismology data experts in the oil and gas industry have quite different concerns from AI experts that support modern ecommerce.

Certainly, there are operational commonalities across such use cases – but DataOps must rise to a diverse set of business drivers, over and above focusing on commonalities across such drivers.

For example, does the organization require batch data processing, or are its needs centered on real-time streaming data? How much of an impact do specific regulatory constraints like GDPR impact the operational parameters for DataOps? Are data experts essentially concerned with math problems, or are they more concerned with data usage challenges, like data drift, quality, and privacy?

Question Three: does DataOps consist of human expertise, a body of knowledge, or tooling – or some combination?

If we circle back to the varied definitions of DataOps in light of the two questions above, we can at least put our finger on the source of confusion around the topic.

On one hand, DataOps is something people do, and thus, something people get good at. In other words, DataOps boils down to human expertise – a specialized skillset that differs from DevOps skills or traditional operations skills.

DataOps is also clearly a set of best practices – a body of knowledge that organizations will build for themselves over time. Whether or not a particular company builds a distinct DataOps team, it should at the least codify the policies, rules, and procedures for DataOps that will address the data challenges it faces.

There’s no question, furthermore, that tooling is a part of the DataOps story as well. Just as tooling supports the automation central to DevOps, so too do DataOps tools support the overall data mission in organizations that implement it.

However, as with DevOps, tooling will never be the whole story. DataOps is at its core a human story – one of expertise and best practice, something people do, not something they buy.

Stay tuned for the next BrainBlog in this series on the parallels and differences between DataOps and DevOps.

Copyright © Intellyx LLC. Unravel is an Intellyx client. None of the other organizations mentioned in this article are Intellyx clients. Intellyx retains final editorial control of this article. Image credit: JD Hancock.

SHARE THIS: