Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
详情
字幕
Fast GPU Inference with TensorRT on Amazon SageMaker
, NVIDIA
Deep Neural Network (DNN) model complexity has increased in the last decade, from Alexnet with 61 million parameters to GPT-3 with 175 billion parameters. Running real-time inference on such large models is difficult without a graph “compression” technology that optimizes a pre-trained neural network graph. NVIDIA TensorRT™-optimized DNN models improve the inference speed by 2-3X on a GPU compared to the original models. In this session, you’ll learn about NVIDIA TensorRT Lite, a developer-friendly path to the TensorRT inference library that allows you to optimize pre-trained DNNs.