As Agentic AI Matures, Can it address the Data Quality Challenge?
Traditional data lake architecture
It almost goes without saying by now, although many people will still say it, perhaps because it remains a challenge for many organizations, that:
The quality of the results from using AI applications depends almost entirely on the quality of the data available to them.
Organizations typically build and fill a data lake to address data quality issues, and in doing so “clean” and standardize data from multiple sources.
A data lake combines and normalizes data across the enterprise for consistent access by analytics, business intelligence, and AI applications, and is therefore a common solution to address the data quality challenge.
The quality issue arises in large part because of historical reasons – enterprise applications are often developed independently, using different technology stacks and a variety of data models and formats. But the quality issue also arises for cloud based SaaS applications, since they also do not have a common data standard.
The Challenge with Data Standards
Organization-wide data standards are difficult to define and even more difficult to enforce. Ensuring consistent data quality at the source (i.e. for each application’s data) is the best way to achieve standardization, but in practice the return on investment can be much lower than many of the always delayed and overdue strategic application improvements that drive revenue and take priority.
Compounding the data quality challenge more recently are the various data formats and cloud technologies that SaaS applications use, such as Salesforce, SAP, HubSpot, NetSuite, Slack. Google Workspace, Office 365, Zoho, and others. These are of course impossible for any organization to standardize, since the data quality challenges are outside of their control.
While data quality may be addressed by each individual SaaS application, when organizations use multiple SaaS applications (which is very common), it exacerbates the overall challenge for an enterprise.
An entire software and database ecosystem exists for creating and managing data lakes, for example in the cloud for Snowflake, Databricks, Microsoft Fabric, Google BigQuery, and others. These typically support analytics use cases rather than AI specific use cases, but they are increasingly also being used as AI data sources.
And data modernization is also on the table, because there are new storage formats, including multi mode formats designed specifically to support AI agents. And some data lakes offer the capability to serve as system of record, processing transactions and storing the results directly in the lake.
Software vendors such as Next Pathway for example offer our of the box solutions for data migration and modernization for data lakes for analytics and AI workloads.
AI Agents enter the picture
As many have also said by now, the emergence of generative AI (and the LLMs that power them) is the most disruptive technology innovation since the emergence of the World Wide Web.
Like the Web, organizations are facing the challenge to adopt and implement gen AI for their business applications either to stay ahead of the competition or to keep up with them. In many industries, such as financial services and retail, adopting gen AI has already become a competitive issue.
Getting the best results from gen AI requires rethinking and often re-engineering existing applications so they are ready to use for AI and support broader AI strategies. Gen AI is non-deterministic, for example, while previous IT solutions are deterministic, which requires careful evaluation to identify the best use cases for it. At the least, organizations will figure out what this fundamental change implies for existing applications and business processes.
Within the past year or so, the rapid pace of advancement in gen AI technologies has brought with it the ability for AI agents to take over significant manual effort and extend business process automation into non-deterministic areas such as research, customer service, and decision support analytics. And even to some extent now into deterministic use cases such as processing payments and generating restocking orders.
It seems like the impact of AI agents on enterprise IT systems is only just beginning, though, and the failure rate of agentic projects remains high. Could AI agents impact or cause a change in direction for data quality projects as well, and thus improve AI agent adoption rates?
Real time or batch?
MCP Servers for Online Access to ERP System Data
AI agents are online processes, not batch processes, and typically request access to the data they need as they work. AI agents leverage MCP servers to obtain the latest and greatest information available to complete their tasks.
Maybe it’s better for AI agents to use data in motion and extract data from the sources as they need it, as opposed to trying to achieve the difficult goal of curating and standardizing data in advance of its use for AI applications.
What if the world is just a messy place and we just learn to live with that? And that learning to adapt to the world as a messy place in real time is just naturally part of what advanced AI agents can do?
AI agents for example assist sales efforts and pull emails, generate sales call summaries, and customer research for personalized outreach that achieves higher response rates.
Agents also create restocking schedules by combining volume forecasting data from an ERP, a warehouse management system (WMS), and staffing logs to recommend daily priorities and order staging calendars.
Connecting AI agents to SaaS applications was one of the big topics at Boomi World recently. (If you missed it you can catch up on the sessions, which are posted in their webpage.)
Boomi for example is one of the integration vendors that offer MCP servers to ERP systems as well as to a wide variety of data sources.
Summing up the Debate
Maybe the data quality issue won’t get solved by data lakes. Maybe it’s just too hard and too expensive to get this right, and too challenging to extend coverage to the entire enterprise. On the other hand, maybe it will, with new tools, new data formats, and AI automation.
If your AI agent has real time access to the data it needs using MCP servers, such as those Boomi is offering, is it really necessary to populate a data lake with the same data?
You may have a data lake for other reasons already, but if not, perhaps a direct MCP connector to the data sources you need for your agent may be a better way to go.
Of course, the downside is that if an organization takes this path, they are not modernizing their data and could be paying a high price for the cost of storage consumed by existing systems, and in some (perhaps many) cases paying for redundant storage as well.
The Intellyx Take
Of course it’s not as simple as choosing the data lake approach or the MCP server approach, because many factors are at play, including the fact that many organizations also have requirements to migrate and modernize their data. And storage costs, especially cloud storage costs, are another significant factor. And in fact a combination of the two might be the right solution for different use cases.
You might want a data lake for analytics and MCP servers for online inquiries.
Newer cloud databases also offer better support for AI workloads that ingest multiple data modalities, such as audio, text, video, images, vectors, and unstructured data.
New data formats such as Parquet, Avro, Vortex, and Lance for example may support some AI workloads better than real time access to existing data sources via MCP. In some cases MCP servers are developed essentially as API wrappers, which may inhibit or restrict some of the more powerful and flexible aspects of gen AI capabilities.
This is probably a case of “one size does not fit all” but using MCP servers to access data in real time for AI agents sounds like a pretty good solution for a lot of enterprise use cases. Including new use cases that AI agents enable.
Copyright © Intellyx BV. Intellyx is the change agent industry analysis and advisory firm focused on enterprise transformation. Covering every angle of enterprise IT from mainframes to artificial intelligence, our broad focus across technologies empowers business executives, IT professionals, and software vendors to leverage disruptive trends to succeed in a dynamic business environment. Next Pathway, Boomi, and Zoho are Intellyx clients. Microsoft is a former Intellyx client. No AI was used to write this article. Image credit: Google Gemini.




