Article for SD Times by Eric Newcomer
The answer, of course, is that it depends. But the Cloud Native Computing Foundation (CNCF) announced some major steps forward toward this goal at the KubeCon Europe conference last week in Amsterdam.
This was the biggest KubeCon ever with about 13,500 attendees, representing an 8% growth over last year, reflecting CNCF’s extraordinary success in establishing Kubernetes as the standard for container orchestration.
During the conference Keynote, CNCF announced that NVIDIA has joined CNCF as a Platinum Member, contributed software to key AI-related open source projects, and committed $4M funding for AI workload testing and certification.
CNCF categorizes AI workloads as to whether they are training large language models (LLMs), using LLMs (i.e. inference processing) or running AI agents.
They consider the majority of inference workloads will benefit from Kubernetes hosting, as cloud native container workloads did 10 years ago.
Agentic workload standardization is now undertaken by the Agentic AI Foundation, another Linux Foundation sub-foundation.
Kubernetes for Inference Processing
“Open AI and ChatGPT is probably one of the fastest growing services of all time,” said Chris Aniszczyk, CNCF CTO. “And they were able to scale that using Kubernetes for a lot of the inference-based workloads.”
OpenAI in fact published two case studies on the CNCF website publicly discussing their use of Kubernetes for key workloads.
“A lot of classic LLM training is done on customized bare metal, Slurm, and PyTorch,” Aniszczyk continued. “This is the classic HPC ecosystem. But a lot of people are using Kubernetes more and more for inference, which I think it’s extremely well suited for.”
To support inference processing standardization, Red Hat is contributing llm-d to CNCF, which is the inference engine developed by Neural Magic, a company Red Hat acquired last year.
“AI model training was developed largely by data scientists building their own specialized infrastructure,” said Brian Stevens, Red Hat SVP and CTO of AI. “The in-production scaling and operation of inference, however, is now becoming a CIO problem, and the language CIOs speak is Kubernetes.”
“Standard Kubernetes orchestration wasn’t designed for the highly stateful and dynamic demands of LLM inference,” Stevens continued. The llm-d project provides the architectural layer needed to treat LLMs like any other scalable microservice.”


