Optimizing Inference for Neural Machine Translation using Sockeye 2
, NVIDIA
Transformer networks have revolutionized the field of machine translation. They’ve been shown to produce better translations, especially for long input sentences, than the traditional recurrent neural networks. However, Transformer models have been growing with the latest GPT-3 having 175 billion parameters. Training and inference on such large models is computationally intensive. Learn how the NVIDIA A100 GPU is designed to train and deploy such large networks efficiently and explore the transformer-based model, using Sockeye, the open source NMT implementation that powers Amazon Translate. We’ll also discuss methods for profiling deep learning workloads using NVIDIA Nsight™ and identify areas for improving performance. We’ll also demonstrate the impact of these optimization techniques with cost-effective inference on an Amazon EC2 G4 instance with NVIDIA T4 GPUs.