Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
详情
字幕
CUDA Graph 在阿里巴巴本地生活推荐系统中的应用 CUDA Graph in Alibaba Local Life Recommender System
, Senior Developement Expert, Alibaba Group
, AI Developer Technology Engineer, NVIDIA
, Engineer, Tech SW, NVIDIA
CUDA Graph provides a mechanism to launch multiple GPU kernels through a single CPU operation. This can greatly reduce kernel launch overhead, which is an issue in many deep learning inference services because they have a large number of CPU threads processing user requests. Each thread tries to launch kernels to the GPU, which leads to significant launch overhead and slows down the system — while the CPU is busy launching kernels, GPU utilization can be very low. The problem can be worse in systems using TensorFlow, which uses only one compute stream for each physical device. We'll describe how to integrate CUDA Graph into TensorFlow and use CUDA Graph to optimize the deep learning recommender models to achieve over 2x throughput.
活动: GTC Digital Spring
日期: March 2022
话题: Accelerated Computing & Dev Tools - Performance Optimization