Name: Simplifying Inference for Every Model with Triton and TensorRT A41128 | GTC Digital September 2022 | NVIDIA On-Demand
Uploaded: 2022-09-20T10:00:00Z
Duration: 2043 s
Description: Learn how to easily optimize and deploy every model with Triton and TensorRT with high performance inference

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

Learn how to easily optimize and deploy every model with Triton and TensorRT with high performance inference. Deploying deep learning models in production with high performance inference is challenging. The deployment software needs to be able to support multiple frameworks, such as TensorFlow and PyTorch, and optimize under competing constraints like latency, accuracy, throughput, and memory size. TensorRT provides world-class inference performance for many models, and it is integrated into TensorFlow, PyTorch, and ONNX-Runtime, which are all supported backends in Triton inference server. The Triton Model Navigator is a tool that provides the ability to automate exporting the model from source to all possible backends, and uses Model Analyzer to find the best deployment configuration to achieve the best performance possible within the constraints.

活动: GTC Digital September

日期: 2022 年 9 月

行业: 所有行业

级别: 初级技术

话题: Deep Learning - Inference

语言: 英语

话题: Deep Learning - Frameworks

所在地: