Introducing NVIDIA Dynamo: A Distributed Inference Serving Framework for Reasoning models

, Principal Product Manager, NVIDIA
, Principal Software Architect, NVIDIA
, NVIDIA
, Senior System Software Engineer, NVIDIA
This session will introduce NVIDIA Dynamo, a new inference serving framework designed to deploy reasoning large language models (LLMs) in multi-node environments. We’ll explore the key components and architecture of the new framework, highlighting how they enable seamless scaling within data centers and drive advanced inference optimization. Additionally, we’ll cover cutting-edge inference serving techniques, including disaggregated serving, which optimizes request handling by separating prefill and decode, increasing the number of inference requests served. Attendees will also learn how they will be able to quickly deploy this innovative serving framework using NVIDIA NIM.
活动: GTC 25
日期: March 2025
话题: AI Platforms / Deployment - AI Inference / Inference Microservices
行业: Cloud Services
级别: 技术 - 高级
NVIDIA 技术: TensorRT,DALI,NVLink / NVSwitch,Triton
语言: 英语
所在地: