Name: CUTLASS: A Performant, Flexible, and Portable Way to Target Hopper Tensor Cores S61198 | GTC 2024 | NVIDIA On-Demand
Uploaded: 2024-03-19T15:00:00Z
Duration: 2976 s
Description: NVIDIA’s H100 introduced fourth-generation Tensor Cores to GPU computing, with over twice the peak performance of the previous generation

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

NVIDIA’s H100 introduced fourth-generation Tensor Cores to GPU computing, with over twice the peak performance of the previous generation. This session will build on our GTC’23 session. We'll describe how the latest version of CUTLASS leverages Hopper features for peak performance, covering major new features since its release last year including convolutions, fused epilogue visitors, Python interface, and more. Our discussion is aimed at those who wish to implement custom kernels for machine learning and HPC applications that achieve peak performance.

活动: GTC 24

日期: March 2024

话题: Accelerated Computing Libraries

级别: 高级技术

行业: 所有行业

NVIDIA 技术: CUDA,Hopper

语言: 英语

所在地: