Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
详情
字幕
Fast, Scalable, and Standardized AI Inference Deployment for Multiple Frameworks, Diverse Models on CPUs and GPUs with Open-source NVIDIA Triton
, Product Manager, NVIDIA
, Product Marketing Manager, NVIDIA
We'll go over NVIDIA Triton Inference Server software and what's new. Triton is an open-source inference-serving software for fast and scalable AI in applications. Learn how Triton helps deploy models from all popular frameworks — including TensorFlow, PyTorch, ONNX, TensorRT, RAPIDS FIL (for XGBoost, Scikit-learn Random Forest, LightGBM), OpenVINO, Python, and even custom C++ backends. Also learn about the features that help optimize inference for multiple query types — real-time, batch, streaming, and model ensembles. We'll cover how to deploy a standardized inference in production on both NVIDIA GPUs and x86 and ARM CPUs in cloud or data center, enterprise edge, and even on embedded devices like the NVIDIA Jetson, as well as how to use Triton in virtualized environments (e.g. VMware vSphere), Kubernetes, and machine-learning platforms like Amazon SageMaker, Azure ML, and Google Vertex AI.