Achieving World Leading Inference Performance Across Workloads
, Group Product Manager, NVIDIA
, Product Marketing Manager, NVIDIA
Latency-critical applications such as conversational AI place strict requirements on throughput and latency, and model size expected from deep learning models. We’ll cover NVIDIA TensorRT, which is an SDK for high performance deep learning inference used in production to minimize latency and maximize throughput. With this talk, new users can learn about the standard workflow to accelerate inference in frameworks including PyTorch, TensorFlow, and ONNX, while experienced users can learn more about the latest updates and best practices to optimize specific use-cases.