Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
详情
字幕
Simplifying Inference for Every Model with Triton and TensorRT
, Group Product Manager, NVIDIA
Learn how to easily optimize and deploy every model with Triton and TensorRT with high performance inference. Deploying deep learning models in production with high performance inference is challenging. The deployment software needs to be able to support multiple frameworks, such as TensorFlow and PyTorch, and optimize under competing constraints like latency, accuracy, throughput, and memory size. TensorRT provides world-class inference performance for many models, and it is integrated into TensorFlow, PyTorch, and ONNX-Runtime, which are all supported backends in Triton inference server. The Triton Model Navigator is a tool that provides the ability to automate exporting the model from source to all possible backends, and uses Model Analyzer to find the best deployment configuration to achieve the best performance possible within the constraints.