Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected

      ONNX Runtime: Accelerated AI Deployment for PC Apps

      , Software Engineer, Microsoft
      , Sr. Developer Technology Software Engineer, NVIDIA
      Recent advances in generative AI have led to new levels of deployment challenges. Especially with large diffusion models, it's becoming increasingly important to ensure optimal performance of models, regardless of the hardware the user is running on. Compatibility across operating systems, or even hardware vendors, often comes at a cost of performance due to falling back on generally available hardware. The only way to overcome this is a backend that can leverage specialized hardware to the fullest. ONNX Runtime supports multiple backends to adapt not only to performance requirements, but also ease shipping and integration. On NVIDIA hardware a developer has the flexibility to choose from various backends, such as CUDA and TensorRT. This presentation will give an overview of each provider and best practices on how to select the ideal backend. Additionally, we'll highlight recent improvements and additions to ONNX Runtime’s CUDA and TensorRT execution provider.
      活动: GTC 24
      日期: March 2024
      行业: 所有行业
      NVIDIA 技术: Cloud / Data Center GPU,cuBLAS,CUDA,cuDNN,DirectX,RTX GPU,RTX Virtual Workstations (vWS)
      话题: Deep Learning Frameworks
      级别: 中级技术
      语言: 英语
      所在地: