Name: CUTLASS: Python API, Enhancements, and NVIDIA Hopper A41131 | GTC Digital September 2022 | NVIDIA On-Demand
Uploaded: 2022-09-20T13:00:00Z
Duration: 2474 s
Description: The latest release of CUTLASS delivers a new Python API for designing, JIT compiling, and launching optimized matrix computations from a Python environment

详情

字幕

The latest release of CUTLASS delivers a new Python API for designing, JIT compiling, and launching optimized matrix computations from a Python environment. The functionality of CUTLASS has also been extended to include grouped and depthwise separable convolution, fused kernels for layernorm and multihead attention, and optimizations to grouped GEMM. Additionally, CUTLASS 2.11 takes advantage of new features on NVIDIA's Hopper architecture, including 2x faster FP64 Tensor Cores and FP8 numerical conversion. We'll describe implementation details of these computations and optimization techniques for achieving peak performance. We'll also provide a preview of CUTLASS 3.0, which offers an enhanced programming model for implementing tensor computations using CUDA.

活动: GTC Digital September

日期: September 2022

话题: Accelerated Computing & Dev Tools - Performance Optimization

级别: 高级技术

行业: 所有行业

语言: 英语

话题: Accelerated Computing & Dev Tools - Programming Languages / Compilers

所在地: