Name: NVIDIA Triton Inference Server on AWS: Customer success stories and AWS deployment methods to optimize inference throughput, reduce latency, and lower GPU or CPU inference costs. SE31488 | GTC Digital November 2021 | NVIDIA On-Demand
Uploaded: 2021-11-09T13:00:00Z
Duration: 3024 s
Description: NVIDIA Triton inference server simplifies the deployment of AI models at scale in production

详情

字幕

NVIDIA Triton inference server simplifies the deployment of AI models at scale in production. This open-source inference-serving software lets teams deploy trained AI models from any framework (TensorFlow, PyTorch, ONNX Runtime, TensorRT, or custom) from AWS SageMaker, ECS, and EKS on a GPU or CPU. Customers can now benefit from the performance optimizations, dynamic batching, and multi-framework support provided by Triton in AWS. Learn how customers use Triton to improve their inference performance. We'll discuss how to deploy NVIDIA Triton in AWS including Amazon SageMaker, EKS, and ECS for GPU-based inference. We'll also discuss getting-started resources. The integration of NVIDIA Triton Inference Server with Amazon SageMaker is available in all AWS regions where Amazon SageMaker is available.

活动: GTC Digital November

日期: November 2021

行业: 所有行业

话题: Deep Learning - Inference

级别: 中级技术

语言: 英语

话题: Accelerated Computing & Dev Tools - Performance Optimization

所在地: