Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected
      • Quality

      Scaling Inference Using NIM Through a ServerLess NCP SaaS Platform

      , Solutions Architect, NVIDIA
      , Sr. Solutions Architect , NVIDIA
      We'll train you to scale your Gen AI workload and create a software-as-a-service (SaaS) serverless platform. We'll use NIMs of an open-source LLM, scaling it using open-source technologies like Kubernetes, Ray, and KServe, and we'll demonstrate the usage of NVCF. We'll show you how to obtain GPU utilization metrics using Grafana and Prometheus, autoscaling compute resources based on inflight demand and defining best practices around efficiently using the underlying abstracted GPU infrastructure based on NCP RA.
      Prerequisite(s):

      Basics of Model Inference, Kubernetes, KServe.
      活动: GTC 25
      日期: March 2025
      话题: AI Platforms / Deployment - AI Inference / Inference Microservices
      NVIDIA 技术: Cloud / Data Center GPU,Hopper,Base Command,NVIDIA NIM,NVIDIA AI Enterprise
      行业: Cloud Services
      级别: 技术 - 高级
      语言: 英语
      所在地: