Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected
      • Quality

      Simplifying Inference for Every Model with Triton and TensorRT

      , Group Product Manager, NVIDIA
      Learn how to easily optimize and deploy every model with Triton and TensorRT with high performance inference. Deploying deep learning models in production with high performance inference is challenging. The deployment software needs to be able to support multiple frameworks, such as TensorFlow and PyTorch, and optimize under competing constraints like latency, accuracy, throughput, and memory size. TensorRT provides world-class inference performance for many models, and it is integrated into TensorFlow, PyTorch, and ONNX-Runtime, which are all supported backends in Triton inference server. The Triton Model Navigator is a tool that provides the ability to automate exporting the model from source to all possible backends, and uses Model Analyzer to find the best deployment configuration to achieve the best performance possible within the constraints.
      活动: GTC Digital September
      日期: 2022 年 9 月
      行业: 所有行业
      级别: 初级技术
      话题: Deep Learning - Inference
      语言: 英语
      话题: Deep Learning - Frameworks
      所在地: