Name: Scaling Inference Using NIM Through a ServerLess NCP SaaS Platform DLIT71918 | GTC 2025 | NVIDIA On-Demand
Uploaded: 2025-03-20T15:00:00Z
Duration: 4089 s
Description: We'll train you to scale your Gen AI workload and create a software-as-a-service (SaaS) serverless platform

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

We'll train you to scale your Gen AI workload and create a software-as-a-service (SaaS) serverless platform. We'll use NIMs of an open-source LLM, scaling it using open-source technologies like Kubernetes, Ray, and KServe, and we'll demonstrate the usage of NVCF. We'll show you how to obtain GPU utilization metrics using Grafana and Prometheus, autoscaling compute resources based on inflight demand and defining best practices around efficiently using the underlying abstracted GPU infrastructure based on NCP RA.
Prerequisite(s):

Basics of Model Inference, Kubernetes, KServe.

活动: GTC 25

日期: March 2025

话题: AI Platforms / Deployment - AI Inference / Inference Microservices

NVIDIA 技术: Cloud / Data Center GPU,Hopper,Base Command,NVIDIA NIM,NVIDIA AI Enterprise

行业: Cloud Services

级别: 技术 - 高级

语言: 英语

所在地: