Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
详情
字幕
The Speed of Thought: Navigate LLM Inference Autoscaling for a Gen AI Application Toward Production
, Solutions Architect, NVIDIA
, Solutions Architect, NVIDIA
, Solutions Architect , NVIDIA
Learn how to choose the autoscaling hyperparameters for your LLM applications by understanding the key metrics during inference. Gain essential tools to optimize latency and throughput by running and dissecting LLM inference benchmarks. See how NVIDIA's benchmarking software can be leveraged to make an informed decision about the Kubernetes deployment of your Gen AI application. You'll acquire best practices and tips to allow you to bring low latency at unmatched cost-effectiveness to your NIM applications for production. Prerequisite(s):
Comfortable reading code in Python and Pandas. Basic understanding of Kubernetes. Basic understanding of LLMs.
活动: GTC 25
日期: March 2025
话题: AI Platforms / Deployment - AI Inference / Inference Microservices
行业: 所有行业
NVIDIA 技术: DGX,HGX,TensorRT,NCCL,NVLink / NVSwitch,Triton,NVIDIA NIM,NVIDIA AI Enterprise