GPU Accelerated Libraries

22 个内容
March 2024
, Senior Product Manager, NVIDIA
, Director of Engineering, Math Libraries, NVIDIA
NVIDIA’s GPU-accelerated Math Libraries, which are part of the CUDA Toolkit and the HPC SDK, are constantly expanding, providing industry-leading performance and coverage of common compute workflows across AI, ML, and HPC. We'll do a deep dive into some of the latest advancements in the
March 2024
, Senior System Software Engineer, NVIDIA
, Senior Fellow, Honeywell Connected Enterprise
NVIDIA has developed a new large-scale solver, cuDSS, for sparse linear systems that uses GPU computations for the matrix factorization and solution. This solver was integrated into the UniSim EO platform using the UniSim AXB sparse linear algebra interface, which enables sparse linear algebra
March 2024
, Senior Architect, NVIDIA
, Sr. Architect, NVIDIA
NVIDIA’s H100 introduced fourth-generation Tensor Cores to GPU computing, with over twice the peak performance of the previous generation. This session will build on our GTC’23 session. We'll describe how the latest version of CUTLASS leverages Hopper features for peak performance, covering major
March 2024
, Principal Developer Technology, NVIDIA
Do you need to compute larger or faster than a single GPU allows? Learn how to scale your application to multiple GPUs and multiple nodes. We'll explain how to use the different available multi-GPU programming models and describe their individual advantages. All programming models, including
March 2024
, Distinguished Engineer, NVIDIA
Discover how NCCL uses every capability of all DGX and HGX platforms to accelerate inter-GPU communication and allow deep learning training to scale further. See how Grace Hopper platforms can leverage multi-node NVLink to compute in parallel at unprecedented speeds. Compare different
March 2024
, Director HPC Architecture, NVIDIA
Take a deep dive into the latest developments in NVIDIA software for high performance computing applications, including a comprehensive look at what’s new in programming models, compilers, libraries, and tools. We'll cover topics of interest to HPC developers, targeting traditional HPC
March 2023
, Principal Architect, NVIDIA
, GPU Architect, NVIDIA
, Sr. Software Engineer, NVIDIA
, HPC C++ Compiler Engineer, NVIDIA
, Senior Software Engineer and Author of Standard C++ Ranges and Senders/Receivers, NVIDIA
, CUDA Software Developer, NVIDIA
, Director of Research, NVIDIA
, Senior C++ Library Engineer, NVIDIA
, C++ Library Engineer (libcu++ Lead), NVIDIA
Do you want to write modern C++ on your GPU? Are you curious about C++ Standard Parallelism? Join NVIDIA's C++ library and standards team for a Q&A session on: C++ Standard Parallelism and NVC++, Thrust (CUDA C++'s high-productivity general-purpose library and parallel algorithms
March 2024
, Software Engineer, NVIDIA
, Senior Data Scientist, NVIDIA
Graph neural networks (GNNs) are an increasingly popular class of artificial neural networks designed to process data that can be represented as graphs. The two prominent GNN frameworks are the Deep Graph Library (DGL) and PyTorch Geometric (PyG). The RAPIDS cuGraph effort has been working on
March 2024
, Senior Software Engineer, NVIDIA
Pandas is flexible, but often slow when processing gigabytes of data. Many frameworks promise higher performance, but they often support only a subset of the Pandas API, require significant code change, and struggle to interact with or accelerate third-party code that you can’t change. RAPIDS cuDF
March 2024
, Data Science Manager, Capital One
, Senior Manager Data Science, Capital One
, Manager, Data Science, Capital One
Recommendation systems are integral to many online platforms, enabling personalized content and product recommendations. The transformer paradigm in particular has been leveraged for building state-of-the-art sequential recommender systems. In this session, we'll expand upon previous work
March 2023
, CV-CUDA Senior Engineer, NVIDIA
CV-CUDA is an open source library that enables developers to build highly efficient, GPU-accelerated pre- and post-processing pipelines in cloud-scale Artificial Intelligence (AI) imaging and computer vision (CV) workloads in mapping, generative AI, three-dimensional worlds, image understanding,
March 2023
, Senior Solutions Architect, NVIDIA
Both the federal community and the commercial marketplace have critical mission needs to rapidly geolocate imagery that has no associated geospatial information for a wide variety of computer vision applications, such as search and rescue, natural hazards detection, and environmental monitoring.
March 2023
, Senior Software Engineer, NVIDIA
, Senior Software Development Engineer, NVIDIA
, Senior Software Development Engineer, NVIDIA
, Deep Learning Manager, NVIDIA
, Sr. CUDA Math Library Engineer and Team Lead, NVIDIA
, Senior Deep Learning Software Engineer, NVIDIA
, Senior Software Development Engineer, NVIDIA
, Senior Director, System SW, NVIDIA
, CV-CUDA development lead, NVIDIA
, CV-CUDA Senior Engineer, NVIDIA
, Manager NPP/nvJPEG, NVIDIA
, Sr. Software Engineer, NVIDIA
Learn about the latest optimizations in NVIDIA's image/signal processing libraries like CV-CUDA, NPP, nvJPEG, and DALI — a fast, flexible data loading and augmentation library. We'll discuss how to use various data processing solutions spanning low-level image and signal processing primitives in NPP,
April 2021
, NVIDIA
NVIDIA’s GPU-accelerated Math Libraries, which are part of the CUDA Toolkit and the HPC SDK, are constantly expanding, providing industry-leading performance and coverage of common compute workflows across AI, ML, and HPC. We'll review the latest developments in the Math Libraries with a
April 2021
, NVIDIA
This talk will be a comparison between Thrust and C++ Standard algorithms and will highlight some of the things only possible in Thrust. Both Thrust and the C++ Standard have an amazing selection of algorithms. There are many algorithms that exist in both Thrust and C++ Standard, but there are
April 2021
, NVIDIA
CUDA C++ is a extension of the ISO C++ language that allows you to use familiar C++ tools to write parallel programmings that run on GPUs. However, one essential C++ tool has been missing from device-side CUDA C++ — the C++ standard library. But not any longer! Introduced in the CUDA 10.2
April 2021
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
Come join NVIDIA’s CUDA C++ Core Libraries team for a Q&A session on: • Thrust— The C++ parallel algorithms library. https://github.com/NVIDIA/thrust • CUB — Cooperative primitives for CUDA C++ kernel authors. https://github.com/NVIDIA/cub • libcu++ — The C++ Standard Library for your entire
April 2021
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
Are you wondering how to easily access tensor cores through NVIDIA Math Libraries, such as sparse tensor cores introduced with the NVIDIA Ampere Architecture GPUs? Or have you already used our libraries and have questions or feedback? Meet the engineers who create tensor core accelerated
April 2021
, NVIDIA
CUTLASS provides building blocks in the form of C++ templates to CUDA programmers who are eager to write their own CUDA kernels to perform deep learning computations. We'll focus on implementing 2-D and 3-D convolution kernels for NVIDIA's CUDA and Tensor cores. We'll describe the Implicit GEMM
April 2021
, NVIDIA
Do you need to compute larger or faster than a single GPU allows? Learn how to scale your application to multiple GPUs and multiple nodes. We'll explain how to use the different available multi-GPU programming models and describe their individual advantages. All programming models, including
April 2021
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
, NVIDIA
Wondering how to scale your code to multiple GPUs in a node or cluster? Need to discuss NCCL or CUDA-aware MPI details? This is the right session for you to ask your beginner or expert questions on multi-GPU programming with CUDA, GPUDirect, NCCL, NVSHMEM, and MPI. Connect with the Experts
April 2021
, NVIDIA
, NVIDIA
NVIDIA Mellanox Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) technology improves upon the performance of MPI and machine learning collective operation by offloading collective operations from the CPU or GPU to the network and eliminating the need to send data