Name: The Speed of Thought: Navigate LLM Inference Autoscaling for a Gen AI Application Toward Production DLIT71339 | GTC 2025 | NVIDIA On-Demand
Uploaded: 2025-03-17T10:00:00Z
Duration: 2321 s
Description: Learn how to choose the autoscaling hyperparameters for your LLM applications by understanding the key metrics during inference

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

Learn how to choose the autoscaling hyperparameters for your LLM applications by understanding the key metrics during inference. Gain essential tools to optimize latency and throughput by running and dissecting LLM inference benchmarks. See how NVIDIA's benchmarking software can be leveraged to make an informed decision about the Kubernetes deployment of your Gen AI application. You'll acquire best practices and tips to allow you to bring low latency at unmatched cost-effectiveness to your NIM applications for production.
Prerequisite(s):

Comfortable reading code in Python and Pandas.
Basic understanding of Kubernetes.
Basic understanding of LLMs.

活动: GTC 25

日期: March 2025

话题: AI Platforms / Deployment - AI Inference / Inference Microservices

行业: 所有行业

NVIDIA 技术: DGX,HGX,TensorRT,NCCL,NVLink / NVSwitch,Triton,NVIDIA NIM,NVIDIA AI Enterprise

级别: 通用

语言: 英语

所在地: