Optimizing Communication with Nsight Systems Network Profiling
, Distinguished Engineer, Software Development Tools, NVIDIA
, Senior System Software Engineer, NVIDIA
With large-scale computation, on the cloud or HPC cluster nodes, knowing the performance of CPUs and GPUs isn't enough to characterize the system’s performance. You need to also consider the inter-node and intra-node network behavior. Applications running over multiple nodes often make extensive use of network resources via libraries such as MPI and UCX, which utilize communication protocols such as RDMA over converged Ethernet, Infiniband, and SHARP. Learn how to use Nsight Systems' network profiling capabilities. See how real-world applications utilize GPUs, CPUs, and networking hardware. We'll demonstrate how to profile and optimize the application's network usage to improve its overall performance. This includes: identifying network congestion; locating bubbles where the NIC/HCA is idle; figuring out the average packet size used by the application; and understanding latencies caused by the application’s network usage.