NVIDIA Blackwell swept the new SemiAnalysis InferenceMAX™ v1 benchmarks, achieving the highest AI inference performance and best overall efficiency. NVIDIA Blackwell enables the highest AI factory revenue: A $5M investment in GB200 NVL72 generates $75 million in token revenue—a 15x return on investment.
NVIDIA Blackwell enables the highest AI factory revenue, including up to 15x ROI. This is a result of extreme co-design across NVIDIA Blackwell, NVLink™, and NVLink Switch for scale-out; NVFP4 for low-precision accuracy; and NVIDIA Dynamo and TensorRT™ LLM for speed and flexibility—as well as development with community frameworks SGLang, vLLM, and more.
$5M GB200 NVL72 Investment Can Generate $75M Token Revenue
DeepSeek-R1 8K/1K results show a 15x performance benefit and revenue opportunity for NVIDIA Blackwell GB200 NVL72 over Hopper H200.
Get unmatched AI performance with NVIDIA AI inference software optimized for NVIDIA-accelerated infrastructure. The NVIDIA Blackwell Ultra, H200 GPU, NVIDIA RTX PRO™ 6000 Blackwell Server Edition, and NVIDIA RTX™ technologies deliver exceptional speed and efficiency for AI inference workloads across data centers, clouds, and workstations.
NVIDIA GB300 NVL72
AI inference demand is surging—and NVIDIA Blackwell Ultra is built to meet that moment. Delivering 1.4 exaFLOPS in a single rack, the NVIDIA GB300 NVL72 unifies 72 NVIDIA Blackwell Ultra GPUs with NVIDIA NVLink™ and NVFP4 to power massive models with extreme efficiency, achieving 50x higher AI factory output while reducing token costs and accelerating real-time reasoning at scale.
The NVIDIA H200 GPU—part of the NVIDIA Hopper Platform— supercharges generative AI and high-performance computing (HPC) workloads with game-changing performance and memory capabilities. As the first GPU with HBM3e, the H200’s larger and faster memory fuels the acceleration of generative AI and large language models (LLMs) while advancing scientific computing for HPC workloads.
The RTX PRO 6000 Blackwell Server Edition GPU delivers supercharged inferencing performance across a broad range of AI models, achieving up to 5x higher performance for enterprise-scale agentic and generative AI applications compared to the previous-generation NVIDIA L40S. NVIDIA RTX PRO™ Servers, available from global system partners, bring the performance and efficiency of the Blackwell architecture to every enterprise data center.
The RTX PRO 6000 Blackwell Workstation Edition is the first desktop GPU to offer 96 GB of GPU memory. The power of the Blackwell GPU architecture, combined with large GPU memory and the NVIDIA AI software stack, enables RTX PRO-powered workstations to deliver incredible acceleration for generative AI and LLM inference directly on the desktop.
Deploying Generative AI in Production With NVIDIA NIM
Unlock the potential of generative AI with NVIDIA NIM. This video dives into how NVIDIA NIM microservices can transform your AI deployment into a production-ready powerhouse.
Triton Inference Server simplifies the deployment of AI models at scale in production. Open-source inference-serving software lets teams deploy trained AI models from any framework—from local storage or cloud platform—on any GPU- or CPU-based infrastructure.
Ever wondered what NVIDIA’s NIM technology is capable of? Delve into the world of mind-blowing digital humans and robots to see what NIMs make possible.