AI Code Gen: Time to Shift Left Again?

December 3, 2025

A major challenge with creating application code using generative AI is the quality of the code it produces.

Generative AI hallucinates and creates code that returns inaccurate results. For computer software, this means bugs, inefficient code, and functionality that doesn’t meet requirements.

Generative AI coding is extremely productive compared to manual coding. Creating applications using human language is simpler and easier than using a computer language.

But unlike computer language, human language is ambiguous and subject to interpretation. Generative AI uses statistical matching algorithms to try to resolve such ambiguities, and is therefore non deterministic because the results are not guaranteed to be an exact match, and often aren’t.

The rate of hallucinations and incorrect results (i.e. bugs and functionality that doesn’t match requirements) increases as the amount of code increases. Unconstrained gen AI code generators can easily create large volumes of code that include large numbers of issues.

If someone uses a simple prompt such as “create a website for my model train hobby business” or “create a mobile app for my restaurant” the very polite and friendly gen AI models will happily oblige and spit out thousands of tokens (at a per token price BTW). But the quantity of code doesn’t indicate whether it is quality code or not.

It’s hard to test and debug manually produced code. It’s even harder to test and debug thousands and thousands of lines of automatically generated code.

The Opposite of “Shift Left”

AI code gen is effectively taking us in the opposite of the “shift left” trend, which follows the idea to eliminate problems and issues in the code as early in the development lifecycle as possible – well before it goes into production. Before it goes into testing, even.

It’s axiomatic that the cost of fixing a bug or remediating an incorrect implementation of a function increases steadily the further along in the lifecycle it’s discovered. In the worst case, a problem is discovered in production and the entire development lifecycle has to be restarted from the beginning at a high cost.

With the emergence of “vibe coding” it seems like we have to learn how to shift left again. Without finding a way to avoid the hallucinations and inaccuracies of auto generated code, fixing such problems after the code is created will be very costly and time consuming.

The challenge is figuring out how to “shift left” again while preserving the productivity benefits – or actually even increasing the productivity benefits – of AI generated code overall.

But how can we make sure the code has no bugs or functionality issues when by definition generative AI is non-deterministic and subject to inaccuracies and incorrect results?

As generative AI creates more and more code, it produces more and more problems. How can the industry “shift left” again to avoid vibe coding issues earlier in the development process?

Putting a “human in the loop” to review code is one way. But this can take a long time with a large volume of complex code.

Breaking the problem up using agents that focus on specific tasks is another. Creating an “agent swarm” or a mullti-agent process is another, in which one agent checks the work of another.

Multi-agent coding systems not only break up the problem into smaller pieces that are easier to debug, it also allows different agents to take on different roles in the development lifecycle.

Some agents can work on the planning part of the process, others on the coding of specific modules, and still others on the testing and validation of the code.

Generative AI is fundamentally a “trial and error” process. Incorrect results have to be caught and the task retried until it produces correct results.

Better to split the problem up into modules that can be evaluated and corrected more easily, and put in place evaluation agents that can use predefined success criteria to judge whether model generated code is correct. If it’s not, try, try again.

Domains Are Your Friend, Again

Just as with microservices-based designs, breaking up the code generation task into business domains and subdomains is a great way to identify separate modules for AI to work on.

Designing microservices breaks up an application into independently deployable modules separated by “bounded contexts,” which basically means managed APIs that microservices use to interact with each other.

In the classic banking example, a business domain would be core banking services, which encompass deposit, withdrawal, transfer, payments, loans, and savings management. If you are generating a new core banking application, you would split up the task into modules that implement each of these functions, and are tied together using APIs. A transfer operation would use the withdrawal and deposit APIs within a transaction that either succeeds or fails to ensure the integrity, consistency, and accuracy of the operation.

Each of the modules might be assigned to a separate development team, which in the classic microservices pattern includes business representation, multiple skills for database and system software use, and automated deployment via CI/CD pipeline.

Splitting the problem up helps deploy software more quickly and allows individual modules to be patched and updated independently.

Governed APIs such as those following the microservices pattern are also often used to link mobile and web apps to such core business functions and third party and partner integrations.

If you are using code generating agents, following domain driven design principles and microservices development and deployment patterns allows you to break up the problem similarly and achieve more accurate results from vibe coding. Breaking the problem up allows the AI to more easily identify and correct errors and hallucinations.

Agent Swarms Are Your Friends

Instead of breaking the problem up into domains and microservices and assigning each microservices to a specific dev team, assign the task to an AI agent, instead.

AI agents can plan a project and identify tasks for other agents to execute. They can orchestrate the activities of multiple agents, which is sometimes called an “agent swarm.”

An agent swarm is a multi-agent system in which multiple agents collaborate to complete a shared objective, such as developing a new core banking app.

Each agent operates independently and is assigned a specific function within the project, contributing to the overall task through distributed execution.

Another way to describe the agent swarm idea is to take a step back and consider that coding agents are really bad at application architecture. If you can serve as the architect or project manager and split up the coding task into separate modules (as in the microservices pattern for example) you will have much better luck completing the overall task than you would if you just asked the coding agent to code the entire application all at once,

Agent swarms basically automate this process. If you can define the task in sufficient detail and provide sufficient context, an agent swarm can take over the responsibility from you for executing the individual components of the application and assembling them.

Along the way, the agents will also test and validate the generated code, reducing the potential for bugs, hallucinations, and incorrect results in the finished product. As long as they are set up and managed correctly, that is. A lot of discussion is underway on exactly how to achieve this.

“Agents are fast and capable, but they need structure, guardrails, and intent in order to behave like a coherent engineering team rather than a cluster of eager new-hires. Purpose — not intelligence — is what keeps them aligned,” wrote Adrian Cockroft in a recent blog post.

The Intellyx Take

Vibe coding entire applications at once is tempting and amazing, but prone to a large volume of errors. And the more code, the more errors.

Breaking the problem up into smaller pieces reduces the volume of errors, reduces the unanticipated impact of errors on other parts of the application, and improves the chances of catching and fixing the errors.

This is basically the new “shift left.” Break the task into smaller, more manageable pieces and iteratively test and improve each piece independently until it passes overall acceptance criteria.

Because gen AI is non-deterministic, it requires a “trial and error” approach to fix errors it catches. Unlike deterministic systems, where you can accurately predict the output from the input, the chance of errors, hallucinations, and inaccurate results with gen AI systems cannot be predicted.

In other words, a non-deterministic system is going to produce errors. You can’t fix that without eliminating the benefits of such a system, such as rapid development and comprehensive knowledge.

You have to accept it and “shift left” by defining the criteria against which to judge success, and holding the AI code generators to it by requiring them to try, try again when they generate errors.

Copyright ©2025 Intellyx BV. Intellyx is an industry analysis and advisory firm focused on enterprise digital transformation. Covering every angle of enterprise IT from mainframes to artificial intelligence, our broad focus across technologies allows business executives and IT professionals to connect the dots among disruptive trends. None of the organizations mentioned in this article is an Intellyx customer. No AI was used to write this article. Image credit: Google Gemini