Accelerate Spark With RAPIDS For Cost Savings

, Senior Director of Engineering , NVIDIA
高度评价
GPUs used with Apache Spark are leveraged to speed up machine learning (ML) model training and inference. Data preparation stages are traditionally run on CPUs. The RAPIDS Accelerator for Apache Spark is a plugin jar that takes advantage of Apache Spark 3.x’s ability to schedule on GPUs. The RAPIDS Accelerator replaces CPU expressions in a physical plan with GPU equivalents for dataframe operations. Code change is not required, making transition to GPUs seamless.

We'll give an overview of what the RAPIDS Accelerator is, how it works, and benefits from using the accelerator. We'll describe recent innovations including integration with table layout formats, improvements to file format support, and operating at scale, and discuss benchmarks showing the performance and cost benefits of leveraging GPUs for Spark ETL processing. We'll showcase a user tool that will help estimate speedups and cost savings with a real-world use case and a user adoption journey.
活动: GTC Digital Spring
日期: March 2023
行业: 所有行业
话题: Data Science
级别: 中级技术
语言: 英语
话题: Data Science and Machine Learning
所在地: