Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected
      • Quality

      The Performance of CUDA with the Flexibility of PyTorch

      , Software Engineer, Meta Platforms
      This talk explores how PyTorch users are also becoming CUDA developers. We'll start with motivating examples from eager, the launch of torch.compile and the more recent trend of kernel zoos. We will share details on how we went about integrating low bit matmuls in torchao and the torch.compile CUTLASS backend. We'll also discuss details on how you can define, build and package your own custom ops in PyTorch so you get the raw performance of CUDA while maintaining the flexibility of PyTorch.
      活动: GTC 25
      日期: March 2025
      行业: 所有行业
      话题: Development and Optimization - Performance Optimization
      NVIDIA 技术: RTX GPU,CUDA,Hopper,cuBLAS,Nsight Compute,Nsight Systems
      级别: Technical – Beginner
      语言: 英语
      所在地: