Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected
      • Quality

      CUDA Graph 在阿里巴巴本地生活推荐系统中的应用 CUDA Graph in Alibaba Local Life Recommender System

      , Senior Developement Expert, Alibaba Group
      , AI Developer Technology Engineer, NVIDIA
      , Engineer, Tech SW, NVIDIA
      CUDA Graph provides a mechanism to launch multiple GPU kernels through a single CPU operation. This can greatly reduce kernel launch overhead, which is an issue in many deep learning inference services because they have a large number of CPU threads processing user requests. Each thread tries to launch kernels to the GPU, which leads to significant launch overhead and slows down the system — while the CPU is busy launching kernels, GPU utilization can be very low. The problem can be worse in systems using TensorFlow, which uses only one compute stream for each physical device. We'll describe how to integrate CUDA Graph into TensorFlow and use CUDA Graph to optimize the deep learning recommender models to achieve over 2x throughput.
      活动: GTC Digital Spring
      日期: March 2022
      话题: Accelerated Computing & Dev Tools - Performance Optimization
      级别: 初级技术
      行业: Consumer Internet
      语言: English, Simplified Chinese
      所在地: