Name: Quantize Large Transformer Diffusion Models to Improve End-to-End Latencies and Save Inference Cost S72556 | GTC 2025 | NVIDIA On-Demand
Uploaded: 2025-03-19T15:00:00Z
Duration: 2402 s
Description: The emergence of diffusion models enables new creative workflows for artists, but their resource-intensive nature poses significant deployment challenges

详情

字幕

The emergence of diffusion models enables new creative workflows for artists, but their resource-intensive nature poses significant deployment challenges. With state-of-the-art image diffusion requiring tens of seconds per image, the computational cost of video diffusion is even more substantial.
We'll showcase the path from research to production-ready TensorRT deployment, leveraging the latest fp8 tensor cores on H100. Not only does this save inference cost, but it also increases the number of users that can be served per GPU. We'll provide in-depth insights into the performance and quality complexities of deploying a quantized diffuser on Adobe’s example. This presentation is ideal for AI researchers and software engineers looking to optimize diffusion model deployment on any NVIDIA GPUs.

活动: GTC 25

日期: March 2025

话题: AI Platforms / Deployment - AI Inference / Inference Microservices

NVIDIA 技术: Cloud / Data Center GPU,CUDA,TensorRT,Hopper,Nsight Systems

行业: 媒体与娱乐

级别: Technical – Intermediate

语言: 英语

所在地: