During its Google Cloud Next 25 event Wednesday, the search giant unveiled the latest version of its Tensor Processing Unit (TPU), the custom chip built to run artificial intelligence -- with a twist.
Also: Why Google Code Assist may finally be the programming power tool you need
For the first time, Google is positioning the chip for inference, the making of predictions for live requests from millions or even billions of users, as opposed to training, the development of neural networks carried out by teams of AI specialists and data scientists.
The Ironwood TPU, as the new chip is called, arrives at an economic inflection point in AI. The industry clearly expects AI moving forward to be less about science projects and more about the actual use of AI models by companies.
And the rise of DeepSeek AI has focused Wall Street more than ever on the enormous cost of building AI for Google and its competitors.
The rise of "reasoning" AI models, such as Google's Gemini, which dramatically increases the number of statements generated by a large language model, creates a sudden surge in the total computing needed to make predictions. As Google put it in describing Ironwood, "reasoning and multi-step inference is shifting the incremental demand for compute -- and therefore cost -- from training to inference time (test-time scaling)."
Also: Think DeepSeek has cut AI spending? Think again
Thus, Ironwood is a statement by Google that its focus on performance and efficiency is shifting to reflect the rising cost of inference versus the research domain of training.
Google has been developing its TPU family of chips for over a decade through six prior generations. However, training chips are generally considered a much lower-volume chip market than inference. That is because training demands rise only as each new, gigantic GenAI research project is inaugurated, which is generally once a year or so.
In contrast, inference is expected to meet the needs of thousands or millions of customers who want day-to-day predictions from the trained neural network. Inference is considered a high-volume market in the chip world.
Google had previously made the case that the sixth-generation Trillium TPU, introduced last year, which became generally available in December, could serve as both a training and an inference chip in one part, emphasizing its ability to speed up the serving of predictions.
In fact, as far back as the TPU version two, in 2017, Google had talked of a combined ability for training and inference.
Also: Google reveals new Kubernetes and GKE enhancements for AI innovation
The positioning of Ironwood as mainly an inference chip, first and foremost, is a departure.
It's a shift that may also mark a change in Google's willingness to rely on Intel, Advanced Micro Devices, and Nvidia as the workhorse of its AI computing fleet. In the past, Google had described the TPU as a necessary investment to achieve cutting-edge research results but not an alternative to its vendors.
In Google's cloud computing operations, based on the number of "instances" run by customers, Intel, AMD, and Nvidia chips make up a combined 99% of processors used, versus less than a percent for the TPU, according to research by KeyBanc Capital Markets.
That reliance on three dominant vendors has economic implications for Google and the other giants, Microsoft and Amazon.
Also: 10 key reasons AI went mainstream overnight - and what happens next
Wall Street stock analysts, who compile measures of Google's individual lines of business, have, from time to time, calculated the economic value of the TPU. For example, in January, stock analyst Gil Luria of the boutique research firm DA Davidson wrote that "Google would have generated as much as$24 billion of revenue last year if it was selling TPUs as hardware to NVDA [Nvidia] customers," meaning in competition with Nvidia.
Conversely, at a time when the cost of AI escalates into multi-hundred-billion-dollar projects such as Stargate, Wall Street analysts believe that Google's TPU could offer the company a way to save money on the cost of AI infrastructure.
Also: DeepSeek's new open-source AI model can outperform o1 for a fraction of the cost
Although Google has paid chip-maker Broadcom to help it take each new TPU into commercial production, Google might still save money using more TPUs versus the price it has to pay to Intel, AMD, and Nvidia to consume ever-larger fleets of chips for inference.
To make the case for Ironwood, Google on Wednesday emphasized the technical advances of Ironwood versus Trillium.
Google said Ironwood gets twice the "performance per watt" of Trillium, as measured by 29.3 trillion floating-point math operations per second.
The Ironwood part has 192GB of DRAM memory, dubbed HBM, or high-bandwidth memory, six times as much as Trillium. The memory bandwidth transmitted is 4.5 times as much, 7.2 terabits per second.
Also: Nvidia dominates in gen AI benchmarks, clobbering 2 rival AI chips
Google said those enhancements are supposed to support much greater movement of data in and out of the chip and between systems.
"Ironwood is designed to minimize data movement and latency on chip while carrying out massive tensor manipulations," said Google.
The memory and bandwidth advances are all part of Google's emphasis on "scaling" its AI infrastructure.
The meaning of scaling is to be able to fully use each chip when grouping together hundreds or thousands of chips to work on a problem in parallel. More chips dedicated to the same problem should lead to a concomitant speed-up in performance.
Again, scaling has an economic component. By effectively grouping chips, the TPUs can achieve greater "utilization," the amount of a given resource that is actually used versus the amount being left idle. Successful scaling means higher utilization of chips, which is good because it means less waste of an expensive resource.
Also:5 reasons why Google's Trillium could transform AI and cloud computing - and 2 obstacles
That's why, in the past, Google has emphasized Trillium's ability to "scale to hundreds of thousands of chips" in a collection of machines.
While Google didn't give explicit details on Ironwood's scaling performance on inference tasks, it once again emphasized on Wednesday the ability of "hundreds of thousands of Ironwood chips to be composed together to rapidly advance the frontiers of GenAI computation."
Also: Intel's new CEO vows to run chipmaker 'as a startup, on day one'
Google's announcement came with a significant software announcement as well, Pathways on Cloud. The Pathways software is code that distributes parts of the AI computing work to different computers. It had been used internally by Google and is now being made available to the public.
Get the morning's top stories in your inbox each day with ourTech Today newsletter.