GPU Accelerated Libraries
22 个内容
March 2024
NVIDIA’s GPU-accelerated Math Libraries, which are part of the CUDA Toolkit and the HPC SDK, are constantly expanding, providing industry-leading performance and coverage of common compute workflows across AI, ML, and HPC. We'll do a deep dive into some of the latest advancements in the …
March 2024
NVIDIA has developed a new large-scale solver, cuDSS, for sparse linear systems that uses GPU computations for the matrix factorization and solution. This solver was integrated into the UniSim EO platform using the UniSim AXB sparse linear algebra interface, which enables sparse linear algebra …
March 2024
NVIDIA’s H100 introduced fourth-generation Tensor Cores to GPU computing, with over twice the peak performance of the previous generation. This session will build on our GTC’23 session. We'll describe how the latest version of CUTLASS leverages Hopper features for peak performance, covering major …
March 2024
Do you need to compute larger or faster than a single GPU allows? Learn how to scale your application to multiple GPUs and multiple nodes. We'll explain how to use the different available multi-GPU programming models and describe their individual advantages. All programming models, including …
Training Deep Learning Models at Scale: How NCCL Enables Best Performance on AI Data Center Networks
Discover how NCCL uses every capability of all DGX and HGX platforms to accelerate inter-GPU communication and allow deep learning training to scale further. See how Grace Hopper platforms can leverage multi-node NVLink to compute in parallel at unprecedented speeds. Compare different …
March 2024
Take a deep dive into the latest developments in NVIDIA software for high performance computing applications, including a comprehensive look at what’s new in programming models, compilers, libraries, and tools. We'll cover topics of interest to HPC developers, targeting traditional HPC …
March 2023
Do you want to write modern C++ on your GPU? Are you curious about C++ Standard Parallelism? Join NVIDIA's C++ library and standards team for a Q&A session on: C++ Standard Parallelism and NVC++, Thrust (CUDA C++'s high-productivity general-purpose library and parallel algorithms …
March 2024
Graph neural networks (GNNs) are an increasingly popular class of artificial neural networks designed to process data that can be represented as graphs. The two prominent GNN frameworks are the Deep Graph Library (DGL) and PyTorch Geometric (PyG). The RAPIDS cuGraph effort has been working on …
March 2024
Pandas is flexible, but often slow when processing gigabytes of data. Many frameworks promise higher performance, but they often support only a subset of the Pandas API, require significant code change, and struggle to interact with or accelerate third-party code that you can’t change. RAPIDS cuDF …
March 2024
Recommendation systems are integral to many online platforms, enabling personalized content and product recommendations. The transformer paradigm in particular has been leveraged for building state-of-the-art sequential recommender systems. In this session, we'll expand upon previous work …
March 2023
CV-CUDA is an open source library that enables developers to build highly efficient, GPU-accelerated pre- and post-processing pipelines in cloud-scale Artificial Intelligence (AI) imaging and computer vision (CV) workloads in mapping, generative AI, three-dimensional worlds, image understanding, …
March 2023
Both the federal community and the commercial marketplace have critical mission needs to rapidly geolocate imagery that has no associated geospatial information for a wide variety of computer vision applications, such as search and rescue, natural hazards detection, and environmental monitoring. …
March 2023
Learn about the latest optimizations in NVIDIA's image/signal processing libraries like CV-CUDA, NPP, nvJPEG, and DALI — a fast, flexible data loading and augmentation library. We'll discuss how to use various data processing solutions spanning low-level image and signal processing primitives in NPP, …
April 2021
NVIDIA’s GPU-accelerated Math Libraries, which are part of the CUDA Toolkit and the HPC SDK, are constantly expanding, providing industry-leading performance and coverage of common compute workflows across AI, ML, and HPC. We'll review the latest developments in the Math Libraries with a …
April 2021
This talk will be a comparison between Thrust and C++ Standard algorithms and will highlight some of the things only possible in Thrust. Both Thrust and the C++ Standard have an amazing selection of algorithms. There are many algorithms that exist in both Thrust and C++ Standard, but there are …
April 2021
CUDA C++ is a extension of the ISO C++ language that allows you to use familiar C++ tools to write parallel programmings that run on GPUs. However, one essential C++ tool has been missing from device-side CUDA C++ — the C++ standard library. But not any longer! Introduced in the CUDA 10.2…
April 2021
Come join NVIDIA’s CUDA C++ Core Libraries team for a Q&A session on: • Thrust— The C++ parallel algorithms library. https://github.com/NVIDIA/thrust • CUB — Cooperative primitives for CUDA C++ kernel authors. https://github.com/NVIDIA/cub • libcu++ — The C++ Standard Library for your entire …
April 2021
Are you wondering how to easily access tensor cores through NVIDIA Math Libraries, such as sparse tensor cores introduced with the NVIDIA Ampere Architecture GPUs? Or have you already used our libraries and have questions or feedback? Meet the engineers who create tensor core accelerated …
April 2021
CUTLASS provides building blocks in the form of C++ templates to CUDA programmers who are eager to write their own CUDA kernels to perform deep learning computations. We'll focus on implementing 2-D and 3-D convolution kernels for NVIDIA's CUDA and Tensor cores. We'll describe the Implicit GEMM …
April 2021
Do you need to compute larger or faster than a single GPU allows? Learn how to scale your application to multiple GPUs and multiple nodes. We'll explain how to use the different available multi-GPU programming models and describe their individual advantages. All programming models, including …
April 2021
Wondering how to scale your code to multiple GPUs in a node or cluster? Need to discuss NCCL or CUDA-aware MPI details? This is the right session for you to ask your beginner or expert questions on multi-GPU programming with CUDA, GPUDirect, NCCL, NVSHMEM, and MPI. Connect with the Experts …
April 2021
NVIDIA Mellanox Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) technology improves upon the performance of MPI and machine learning collective operation by offloading collective operations from the CPU or GPU to the network and eliminating the need to send data …