Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
详情
字幕
Scaling Inference Using NIM Through a ServerLess NCP SaaS Platform
, Solutions Architect, NVIDIA
, Sr. Solutions Architect , NVIDIA
We'll train you to scale your Gen AI workload and create a software-as-a-service (SaaS) serverless platform. We'll use NIMs of an open-source LLM, scaling it using open-source technologies like Kubernetes, Ray, and KServe, and we'll demonstrate the usage of NVCF. We'll show you how to obtain GPU utilization metrics using Grafana and Prometheus, autoscaling compute resources based on inflight demand and defining best practices around efficiently using the underlying abstracted GPU infrastructure based on NCP RA. Prerequisite(s):
Basics of Model Inference, Kubernetes, KServe.
活动: GTC 25
日期: March 2025
话题: AI Platforms / Deployment - AI Inference / Inference Microservices
NVIDIA 技术: Cloud / Data Center GPU,Hopper,Base Command,NVIDIA NIM,NVIDIA AI Enterprise