NVIDIA and Google Cloud are advancing AI infrastructure to dramatically reduce the cost of AI inference at scale. At the Google Cloud Next conference, both companies introduced a new generation of systems designed to deliver up to 10 times lower inference cost per token while achieving 10 times higher throughput per megawatt. This development signals a major shift in how enterprises can deploy AI workloads efficiently in production environments.
The innovation focuses on A5X bare-metal instances powered by NVIDIA’s Vera Rubin NVL72 architecture. These systems combine hardware and software co-design to optimise performance and energy efficiency. By using advanced networking technologies like NVIDIA ConnectX-9 SuperNICs and Google Virgo networking, the system can support many thousands of GPUs in one group, allowing for big AI tasks to be done quickly.
This scale introduces new operational complexity, especially in managing workloads across distributed processors. To address this, NVIDIA and Google Cloud are also introducing managed training and orchestration tools that automate cluster sizing, failure recovery, and execution. These capabilities allow enterprises to focus on model performance rather than infrastructure management.
- AI inference costs reduced by up to 10x
- Throughput per megawatt increased by 10x
- Infrastructure scales to hundreds of thousands of GPUs
- Managed systems reduce operational overhead
Beyond performance, data governance and security remain critical. The infrastructure integrates confidential computing technologies to ensure that sensitive data used in training and inference remains encrypted, even within cloud environments. This is particularly important for regulated industries such as healthcare and finance.
Ultimately, NVIDIA and Google Cloud are redefining the economics of AI deployment. By bringing together flexible infrastructure, improved networking, and built-in security, they are helping businesses shift from trying out AI to using it on a large scale in a way that is affordable, safe, and suitable for real-life use.
Quelle:

