Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
详情
字幕
The Performance of CUDA with the Flexibility of PyTorch
, Software Engineer, Meta Platforms
This talk explores how PyTorch users are also becoming CUDA developers. We'll start with motivating examples from eager, the launch of torch.compile and the more recent trend of kernel zoos. We will share details on how we went about integrating low bit matmuls in torchao and the torch.compile CUTLASS backend. We'll also discuss details on how you can define, build and package your own custom ops in PyTorch so you get the raw performance of CUDA while maintaining the flexibility of PyTorch.
活动: GTC 25
日期: March 2025
行业: 所有行业
话题: Development and Optimization - Performance Optimization
NVIDIA 技术: RTX GPU,CUDA,Hopper,cuBLAS,Nsight Compute,Nsight Systems