Optimizing Deep Learning Inference using NVIDIA GPUs on AWS Cloud
, NVIDIA
Deep learning inference is a compute-intensive workload that affects user-experience. Real-time applications need low latency and data center efficiency requires high throughput. In this session, we’ll demonstrate how developers can use NVIDIA TensorRT to optimize neural network models, trained in all major frameworks, and deploy those optimized models in the cloud or at the edge. We’ll also show code samples to demonstrate the workflow with various frameworks, as well as how to calibrate for lower precision with high accuracy.