Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected
      • Quality

      The Speed of Thought: Navigate LLM Inference Autoscaling for a Gen AI Application Toward Production

      , Solutions Architect, NVIDIA
      , Solutions Architect, NVIDIA
      , Solutions Architect , NVIDIA
      Learn how to choose the autoscaling hyperparameters for your LLM applications by understanding the key metrics during inference. Gain essential tools to optimize latency and throughput by running and dissecting LLM inference benchmarks. See how NVIDIA's benchmarking software can be leveraged to make an informed decision about the Kubernetes deployment of your Gen AI application. You'll acquire best practices and tips to allow you to bring low latency at unmatched cost-effectiveness to your NIM applications for production.
      Prerequisite(s):

      Comfortable reading code in Python and Pandas.
      Basic understanding of Kubernetes.
      Basic understanding of LLMs.
      活动: GTC 25
      日期: March 2025
      话题: AI Platforms / Deployment - AI Inference / Inference Microservices
      行业: 所有行业
      NVIDIA 技术: DGX,HGX,TensorRT,NCCL,NVLink / NVSwitch,Triton,NVIDIA NIM,NVIDIA AI Enterprise
      级别: 通用
      语言: 英语
      所在地: