Beyond CUDA: The Case for Block-based GPU Programming

, OpenAI
Traditional single instruction, multiple threads (SIMT) programming with CUDA, for all its benefits, can be daunting to machine learning researchers in need of fast custom kernels. We'll shed light on alternative programming models capable of improving GPU programmability without too much of an impact on expressivity. Some such models have recently emerged (e.g., TVM, MLIR Affine), but these are rarely applicable beyond dense tensor algebra — making them a poor fit for workloads requiring (for example) custom data structures. We'll describe the design and implementation of Triton, a mid-level programming language that uses _block-based_ abstractions to simplify kernel development and fusion for researchers without any GPU programming expertise.
活动: GTC Digital November
日期: November 2021
话题: Accelerated Computing & Dev Tools - Programming Languages / Compilers
行业: 所有行业
级别: 中级技术
语言: 英语
话题: Deep Learning - Frameworks
所在地: