Name: CUDA Techniques to Maximize Memory Bandwidth and Hide Latency S72683 | GTC 2025 | NVIDIA On-Demand
Uploaded: 2025-03-17T13:00:00Z
Duration: 5270 s
Description: Do you want to write speed-of-light CUDA kernels? When a hand-written kernel is the right approach for the job, you want to use the best possible practices

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

Do you want to write speed-of-light CUDA kernels? When a hand-written kernel is the right approach for the job, you want to use the best possible practices to maximize performance on the GPU. In this tutorial, we will showcase techniques to maximize memory bandwidth and hide memory latencies in our CUDA kernels, including how to efficiently use shared memory, distributed shared memory and asynchronous data copies.

活动: GTC 25

日期: March 2025

行业: 所有行业

NVIDIA 技术: CUDA,CUDA-X,NSight Comute,NSight Systems

话题: Development and Optimization - Performance Optimization

级别: 技术 - 高级

语言: 英语

所在地: