Name: Achieving World Leading Inference Performance Across Workloads S41306 | GTC Digital Spring 2022 | NVIDIA On-Demand
Uploaded: 2022-03-23T10:00:00Z
Duration: 1846 s
Description: Latency-critical applications such as conversational AI place strict requirements on throughput and latency, and model size expected from deep learning mod

详情

字幕

Latency-critical applications such as conversational AI place strict requirements on throughput and latency, and model size expected from deep learning models. We’ll cover NVIDIA TensorRT, which is an SDK for high performance deep learning inference used in production to minimize latency and maximize throughput. With this talk, new users can learn about the standard workflow to accelerate inference in frameworks including PyTorch, TensorFlow, and ONNX, while experienced users can learn more about the latest updates and best practices to optimize specific use-cases.

活动: GTC Digital Spring

日期: March 2022

行业: 所有行业

话题: Deep Learning - Inference

级别: 中级技术

语言: 英语

所在地: