Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • default, selected

    Scaling Inference Using NIM Through a ServerLess NCP SaaS Platform

    , Solutions Architect, NVIDIA
    , Sr. Solutions Architect , NVIDIA
    We'll train you to scale your Gen AI workload and create a software-as-a-service (SaaS) serverless platform. We'll use NIMs of an open-source LLM, scaling it using open-source technologies like Kubernetes, Ray, and KServe, and we'll demonstrate the usage of NVCF. We'll show you how to obtain GPU utilization metrics using Grafana and Prometheus, autoscaling compute resources based on inflight demand and defining best practices around efficiently using the underlying abstracted GPU infrastructure based on NCP RA.
    Prerequisite(s):

    Basics of Model Inference, Kubernetes, KServe.
    活动: GTC 25
    日期: March 2025
    话题: AI Platforms / Deployment - AI Inference / Inference Microservices
    NVIDIA 技术: Cloud / Data Center GPU,Hopper,Base Command,NVIDIA NIM,NVIDIA AI Enterprise
    行业: Cloud Services
    级别: 技术 - 高级
    语言: 英语
    所在地: