Google's Latest Chip Is All About Reducing One Huge Hidden Cost In Ai

SERVERS

During its Google Cloud Next 25 event Wednesday, the search giant unveiled the latest version of its Tensor Processing Unit (TPU), the custom chip built to run artificial intelligence -- with a twist.

Also: Why Google Code Assist may finally be the programming power tool you need

For the first time, Google is positioning the chip for inference, the making of predictions for live requests from millions or even billions of users, as opposed to training, the development of neural networks carried out by teams of AI specialists and data scientists.

Ironwood TPU

The Ironwood TPU, as the new chip is called, arrives at an economic inflection point in AI. The industry clearly expects AI moving forward to be less about science projects and more about the actual use of AI models by companies.

And the rise of DeepSeek AI has focused Wall Street more than ever on the enormous cost of building AI for Google and its competitors.

The rise of "reasoning" AI models, such as Google's Gemini, which dramatically increases the number of statements generated by a large language model, creates a sudden surge in the total computing needed to make predictions. As Google put it in describing Ironwood, "reasoning and multi-step inference is shifting the incremental demand for compute -- and therefore cost -- from training to inference time (test-time scaling)."

Also: Think DeepSeek has cut AI spending? Think again

Thus, Ironwood is a statement by Google that its focus on performance and efficiency is shifting to reflect the rising cost of inference versus the research domain of training.

An inference chip

Google has been developing its TPU family of chips for over a decade through six prior generations. However, training chips are generally considered a much lower-volume chip market than inference. That is because training demands rise only as each new, gigantic GenAI research project is inaugurated, which is generally once a year or so.

In contrast, inference is expected to meet the needs of thousands or millions of customers who want day-to-day predictions from the trained neural network. Inference is considered a high-volume market in the chip world.

Google had previously made the case that the sixth-generation Trillium TPU, introduced last year, which became generally available in December, could serve as both a training and an inference chip in one part, emphasizing its ability to speed up the serving of predictions.

In fact, as far back as the TPU version two, in 2017, Google had talked of a combined ability for training and inference.

Also: Google reveals new Kubernetes and GKE enhancements for AI innovation

The positioning of Ironwood as mainly an inference chip, first and foremost, is a departure.

Necessary investment

It's a shift that may also mark a change in Google's willingness to rely on Intel, Advanced Micro Devices, and Nvidia as the workhorse of its AI computing fleet. In the past, Google had described the TPU as a necessary investment to achieve cutting-edge research results but not an alternative to its vendors.

In Google's cloud computing operations, based on the number of "instances" run by customers, Intel, AMD, and Nvidia chips make up a combined 99% of processors used, versus less than a percent for the TPU, according to research by KeyBanc Capital Markets.

That reliance on three dominant vendors has economic implications for Google and the other giants, Microsoft and Amazon.

Also: 10 key reasons AI went mainstream overnight - and what happens next

Wall Street stock analysts, who compile measures of Google's individual lines of business, have, from time to time, calculated the economic value of the TPU. For example, in January, stock analyst Gil Luria of the boutique research firm DA Davidson wrote that "Google would have generated as much as$24 billion of revenue last year if it was selling TPUs as hardware to NVDA [Nvidia] customers," meaning in competition with Nvidia.

Conversely, at a time when the cost of AI escalates into multi-hundred-billion-dollar projects such as Stargate, Wall Street analysts believe that Google's TPU could offer the company a way to save money on the cost of AI infrastructure.

Also: DeepSeek's new open-source AI model can outperform o1 for a fraction of the cost

Although Google has paid chip-maker Broadcom to help it take each new TPU into commercial production, Google might still save money using more TPUs versus the price it has to pay to Intel, AMD, and Nvidia to consume ever-larger fleets of chips for inference.

Ironwood vs. Trillium

To make the case for Ironwood, Google on Wednesday emphasized the technical advances of Ironwood versus Trillium.

Google said Ironwood gets twice the "performance per watt" of Trillium, as measured by 29.3 trillion floating-point math operations per second.

The Ironwood part has 192GB of DRAM memory, dubbed HBM, or high-bandwidth memory, six times as much as Trillium. The memory bandwidth transmitted is 4.5 times as much, 7.2 terabits per second.

Also: Nvidia dominates in gen AI benchmarks, clobbering 2 rival AI chips

Google said those enhancements are supposed to support much greater movement of data in and out of the chip and between systems.

"Ironwood is designed to minimize data movement and latency on chip while carrying out massive tensor manipulations," said Google.

Scaling AI infrastructure

The memory and bandwidth advances are all part of Google's emphasis on "scaling" its AI infrastructure.

The meaning of scaling is to be able to fully use each chip when grouping together hundreds or thousands of chips to work on a problem in parallel. More chips dedicated to the same problem should lead to a concomitant speed-up in performance.

Again, scaling has an economic component. By effectively grouping chips, the TPUs can achieve greater "utilization," the amount of a given resource that is actually used versus the amount being left idle. Successful scaling means higher utilization of chips, which is good because it means less waste of an expensive resource.

Also:5 reasons why Google's Trillium could transform AI and cloud computing - and 2 obstacles

That's why, in the past, Google has emphasized Trillium's ability to "scale to hundreds of thousands of chips" in a collection of machines.

While Google didn't give explicit details on Ironwood's scaling performance on inference tasks, it once again emphasized on Wednesday the ability of "hundreds of thousands of Ironwood chips to be composed together to rapidly advance the frontiers of GenAI computation."

Also: Intel's new CEO vows to run chipmaker 'as a startup, on day one'

Google's announcement came with a significant software announcement as well, Pathways on Cloud. The Pathways software is code that distributes parts of the AI computing work to different computers. It had been used internally by Google and is now being made available to the public.

Get the morning's top stories in your inbox each day with ourTech Today newsletter.

Artificial Intelligence

The best AI for coding in 2025 (and what not to use - including DeepSeek R1)
I tested DeepSeek's R1 and V3 coding skills - and we're not all doomed (yet)
How to remove Copilot from your Microsoft 365 plan
How to install an LLM on MacOS (and why you should)

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

​Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Overview of the S6730-H24X6C-V2

Google's latest chip is all about reducing one huge hidden cost in AI

Ironwood TPU

An inference chip

Necessary investment

Ironwood vs. Trillium

Scaling AI infrastructure

Artificial Intelligence

Hot Tags : Innovation

Ordering Guide

Resources

About Us

Introduction to Huawei CloudEngine S6730-H Series Switches