The emergence of diffusion models enables new creative workflows for artists, but their resource-intensive nature poses significant deployment challenges. With state-of-the-art image diffusion requiring tens of seconds per image, the computational cost of video diffusion is even more substantial. We'll showcase the path from research to production-ready TensorRT deployment, leveraging the latest fp8 tensor cores on H100. Not only does this save inference cost, but it also increases the number of users that can be served per GPU. We'll provide in-depth insights into the performance and quality complexities of deploying a quantized diffuser on Adobe’s example. This presentation is ideal for AI researchers and software engineers looking to optimize diffusion model deployment on any NVIDIA GPUs.
活动: GTC 25
日期: March 2025
话题: AI Platforms / Deployment - AI Inference / Inference Microservices
NVIDIA 技术: Cloud / Data Center GPU,CUDA,TensorRT,Hopper,Nsight Systems