Using AI for legacy data modernization? Don’t fall into this trap

brain Blog Post for next pathway by eric newcomer

If the future of applications is in genAI, then the future of data for AI is in the cloud. 

The effectiveness of genAI applications depends almost entirely on the quality of data those applications work with. The goal of any legacy data modernization project includes laying the foundation for AI applications.

For this reason, legacy data conversion and migration tools are exploding on the market and gaining adoption rapidly as organizations move their data into such cloud repositories as Snowflake, Databricks, Big Query, and Fabric for training AI models, processing inference requests, and feeding AI agents. 

And for many organizations, providing quality, curated data for GenAI requires successfully  extracting, converting, and migrating data for modern applications from legacy sources, such as relational databases, data warehouses, files, and big data systems. 

The good news is that most of these systems support SQL. But the bad news is that SQL isn’t enough to ensure successful data modernization. 

Given the plethora of AI-generated SQL tools, however, it’s easy enough to fall into the trap of thinking that it is. 

Understanding the trap 

If you want good results from genAI, you have to put some effort into it. You have to review the results, provide good input data, and iterate and tweak prompts until the results are correct. 

Generating SQL from human language prompts is a tempting and simple solution to legacy data migration, but it won’t achieve the high quality results you need to get your genAI data ready for modern applications. At least not without a lot of manual effort to make sure the AI gets it right.  

In human conversation this simplistic trap is avoided intuitively. Whenever we talk with another person we keep an ear open for any statement that may not be true or might need to be validated and corrected. 

As humans, we spend a considerable amount of time in our conversations just trying to be sure we understand each other correctly. And to do that, we use conversation context.  

Metadata is Essential, but Context is King

By establishing the right context for an LLM conversation up front it’s going to be easier and faster to achieve the results you want. Just as it is for human conversations, which genAI is engineered to emulate. 

In a genAI data modernization project, providing the right context is all about providing the business rules, requirements, and policies that apply to corporate data – in other words, the context that gives corporate data its shape and meaning and through which it’s understood and valued by the business.  

Understanding how legacy data is created, processed, and interpreted is even more important to a successful data modernization project than the schema for transforming the data types and storage formats. Because data is intended to be collectively interpreted as business information. 

How to Establish and use Context

The right context needs to include the entire data lifecycle, including legacy environments, analyzing complexities and dependencies, translating stored procedures and ETL pipelines, and testing and validating results – in addition to basic SQL translation. 

Context-aware AI platforms for data modernization require a comprehensive understanding of both the source and target environments, rather than just looking at isolated code snippets in independent prompts.

Setting the right context means understanding not only the structural elements of legacy data, such as schemas, data types, and data relationships, but also understanding the impact of business requirements such as common query patterns, data sharing across application functions, and business constraints such as uptime, reliability, and recovery requirements. 

Effective context includes structural, operational, and behavioral information, which can often be delivered using specification-driven approaches. 

Key context elements that help AI agents modernize data include: 

  • Metadata – including schemas, data types, data lineage, entity-relationship diagrams and relationships such as keys and foreign keys
  • Business Rules – including what’s contained in application functions and stored procedures, calculation logic, analytic rules and definitions, constraints, triggers, integration points, and performance, reliability and integrity goals
  • Migration Specifications – including clear definitions of migration goals, objectives and constraints, data volume parameters, downtime parameters, rollback rules, and validation rules
  • Application Context – including how the database interacts with applications and message queues across functions, transaction boundaries, workload analysis, query patterns, and common join paths and query optimizations.

How Next Pathway Helps 

Next Pathway provides tools for understanding legacy data environments, including code as well as metadata and policies, that gather and prepare the context required for successful AI based data modernization initiatives.

Next Pathway includes business rules accumulated from more than 160 migrations, which have involved robust and complex stored procedures, and sophisticated ETL pipelines. 

Next Pathway starts a cloud modernization project with an enterprise wide scan to discover and understand the business and technical environment surrounding legacy data sources, and converts that into actionable context so an LLM can accurately create migration scripts to automatically move the data to cloud databases for use in modern AI applications. 

Next Pathway’s proprietary AI agents are then based specifically on precise business rules and specific data modernization context actions for each specific client migration. If you are using AI without such training, you risk getting incorrect results and hallucinations that may require extensive manual rework, or worse, start your migration with misleading and false information.. 

Other general purpose AI tools fall short because they are not specialists in data modernization and look for the easy answer, such as SQL script generation, which can be a time-wasting trap. 

The Intellyx Take 

In human language, as in business and AI, establishing the right context is key to understanding and success. 

Simplistic AI data migration tools rely on prompts to generate SQL scripts that often typically don’t incorporate the overall context essential to modernization project success, such as business requirements, application relationships, and operational constraints. 

Tools such as those offered by Next Pathway that gather and prepare the context essential for technical and business data modernization initiatives are going to produce better results more quickly, and are able to automate much more of the modernization process than simplistic SQL syntax generators. 

After all, the future of IT is AI, and an organization’s data has to be ready for that future as quickly and accurately as possible. 

Copyright © Intellyx BV. Next Pathway is an Intellyx customer. Intellyx retains final editorial control of this article. No AI was used to write this article. Image source: Google Gemini. 

SHARE THIS: