Name: The Performance of CUDA with the Flexibility of PyTorch S71946 | GTC 2025 | NVIDIA On-Demand
Uploaded: 2025-03-19T09:00:00Z
Duration: 2525 s
Description: This talk explores how PyTorch users are also becoming CUDA developers. We'll start with motivating examples from eager, the launch of torch

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

This talk explores how PyTorch users are also becoming CUDA developers. We'll start with motivating examples from eager, the launch of torch.compile and the more recent trend of kernel zoos. We will share details on how we went about integrating low bit matmuls in torchao and the torch.compile CUTLASS backend. We'll also discuss details on how you can define, build and package your own custom ops in PyTorch so you get the raw performance of CUDA while maintaining the flexibility of PyTorch.

活动: GTC 25

日期: March 2025

行业: 所有行业

话题: Development and Optimization - Performance Optimization

NVIDIA 技术: RTX GPU,CUDA,Hopper,cuBLAS,Nsight Compute,Nsight Systems

级别: Technical – Beginner

语言: 英语

所在地: