An Intellyx Brain Candy Brief
As AI adoption increases, so do token charges. New models basically say to us “AI does more, but it costs more.”
Therefore, it seems like token costs are going to continue to rise, and may even constrain AI adoption and use. The good news is that the generative AI industry is starting to focus more now on reducing token costs and increasing the efficiency of processing AI workloads.
Gimlet Labs, a startup spun out of Stanford, markets an abstraction layer that reduces token cost and improves performance for AI inference workloads.
Their Gimlet Cloud is a multi-chip neocloud (which is a cloud designed for high-performance AI workloads) with an orchestration layer that evaluates an inference workload, decomposes it into subsets, and routes the subsets to the type(s) of silicon best suited for processing them.
Gimlet Cloud takes advantage of the fact that different chips are better at different things – some are more optimized for memory processing, while others are more optimized for I/O processing for example. Matching workload subsets to the right silicon results in better overall token efficiency and performance.
Chips include traditional GPUs, CPUs, and SRAM-based architectures. They currently support silicon from NVIDIA, AMD, Intel, ARM, Cerebras and d-Matrix, and plan to add others going forward.
The silicon-abstract solution deploys to Kubernetes as a substrate for managing the inference workloads.
Gimlet Labs recently joined MLCommons, sponsor of the MLPerf benchmark, a standard measure of AI performance. They already contribute to the MLPerf Inference Benchmark, which helps improve inference workload efficiency across the software and hardware stacks. GoIng forward, they will sponsor and contribute to new benchmarks to help users evaluate inference processing alternatives.
Gimlet Cloud offers a hosted service and also can be deployed in a data center.
Copyright © Intellyx BV. Intellyx is the change agent industry analysis and advisory firm focused on enterprise transformation. Covering every angle of enterprise IT from mainframes to artificial intelligence, our broad focus across technologies empowers business executives, IT professionals, and software vendors to leverage disruptive trends to succeed in a dynamic business environment. No AI was used to produce this article. To be considered for a Brain Candy article, email us at pr@intellyx.com.


