Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected

      CUTLASS: A Performant, Flexible, and Portable Way to Target Hopper Tensor Cores

      , Senior Architect, NVIDIA
      , Sr. Architect, NVIDIA
      NVIDIA’s H100 introduced fourth-generation Tensor Cores to GPU computing, with over twice the peak performance of the previous generation. This session will build on our GTC’23 session. We'll describe how the latest version of CUTLASS leverages Hopper features for peak performance, covering major new features since its release last year including convolutions, fused epilogue visitors, Python interface, and more. Our discussion is aimed at those who wish to implement custom kernels for machine learning and HPC applications that achieve peak performance.
      活动: GTC 24
      日期: March 2024
      话题: Accelerated Computing Libraries
      级别: 高级技术
      行业: 所有行业
      NVIDIA 技术: CUDA,Hopper
      语言: 英语
      所在地: