Name: Introducing NVIDIA Dynamo: A Distributed Inference Serving Framework for Reasoning models S73042 | GTC 2025 | NVIDIA On-Demand
Uploaded: 2025-03-18T14:00:00Z
Duration: 5358 s
Description: This session will introduce NVIDIA Dynamo, a new inference serving framework designed to deploy reasoning large language models (LLMs) in multi-node enviro

详情

字幕

This session will introduce NVIDIA Dynamo, a new inference serving framework designed to deploy reasoning large language models (LLMs) in multi-node environments. We’ll explore the key components and architecture of the new framework, highlighting how they enable seamless scaling within data centers and drive advanced inference optimization. Additionally, we’ll cover cutting-edge inference serving techniques, including disaggregated serving, which optimizes request handling by separating prefill and decode, increasing the number of inference requests served. Attendees will also learn how they will be able to quickly deploy this innovative serving framework using NVIDIA NIM.

活动: GTC 25

日期: March 2025

话题: AI Platforms / Deployment - AI Inference / Inference Microservices

行业: Cloud Services

级别: 技术 - 高级

NVIDIA 技术: TensorRT,DALI,NVLink / NVSwitch,Triton

语言: 英语

所在地: