Deploy and Scale AI Deep Learning Models Easily with Triton Inference Server
, NVIDIA
Triton Inference Server is a model serving software that simplifies the deployment of AI models at-scale in production. It allows teams to deploy trained AI models from any framework (NVDIA TensorFlow, NVIDIA TensorRT, PyTorch, ONNX Runtime, or a custom framework) on any GPU- or CPU-based infrastructure (cloud, data center, or edge). Learn about high performance inference serving with Triton's concurrent execution, dynamic batching and features, and deploying in different environments, through integrations, using Kubernetes/EKS and other tools.