Name: ONNX Runtime: Accelerated AI Deployment for PC Apps S62336 | GTC 2024 | NVIDIA On-Demand
Uploaded: 2024-03-19T09:00:00Z
Duration: 3020 s
Description: Recent advances in generative AI have led to new levels of deployment challenges

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

Recent advances in generative AI have led to new levels of deployment challenges. Especially with large diffusion models, it's becoming increasingly important to ensure optimal performance of models, regardless of the hardware the user is running on. Compatibility across operating systems, or even hardware vendors, often comes at a cost of performance due to falling back on generally available hardware. The only way to overcome this is a backend that can leverage specialized hardware to the fullest. ONNX Runtime supports multiple backends to adapt not only to performance requirements, but also ease shipping and integration. On NVIDIA hardware a developer has the flexibility to choose from various backends, such as CUDA and TensorRT. This presentation will give an overview of each provider and best practices on how to select the ideal backend. Additionally, we'll highlight recent improvements and additions to ONNX Runtime’s CUDA and TensorRT execution provider.

活动: GTC 24

日期: March 2024

行业: 所有行业

NVIDIA 技术: Cloud / Data Center GPU,cuBLAS,CUDA,cuDNN,DirectX,RTX GPU,RTX Virtual Workstations (vWS)

话题: Deep Learning Frameworks

级别: 中级技术

语言: 英语

所在地: