Name: Large Models are not Always Expensive: Large Scale Mixture of Expert Models with Efficient Inference Empowers Microsoft Translator with Best Models (Presented by Microsoft Azure) S42518 | GTC Digital Spring 2022 | NVIDIA On-Demand
Uploaded: 2022-03-22T10:00:00Z
Duration: 2896 s
Description: Giant transformer models with billions of parameters are achieving state-of-the-art results on various natural language processing tasks

详情

字幕

Giant transformer models with billions of parameters are achieving state-of-the-art results on various natural language processing tasks. Such large models require computation linear in the number of parameters. To combat this problem, Mixture of Experts (MoE) introduces an architecture where the computation required is sub-linear in the number of parameters. We'll demonstrate the most efficient implementation of MoE on a single GPU to date, achieving 10-20x speedup over standard PyTorch libraries. We'll give an overview of the model architecture and training techniques, and describe extensions applied to NVIDIA’s FasterTransformer library to achieve state-of-the-art performance. Lastly, we'll demonstrate how we serve our 5 billion parameter model using AML and Triton to perform document translation for several language pairs with multilingual machine translation systems. We believe this work will help to unlock the potential of MoE models for production scenarios. Watch this session for a chance to be entered to win a special SWAG Box sponsored by Microsoft and NVIDIA.

活动: GTC Digital Spring

日期: March 2022

行业: Cloud Services

话题: Conversational AI / NLP

级别: 中级技术

语言: 英语

所在地: