Name: Bridging the Gap Between Basic Neural Language Models, Transformers, and Megatron S41966 | GTC Digital Spring 2022 | NVIDIA On-Demand
Uploaded: 2022-03-21T12:00:00Z
Duration: 2721 s
Description: The Transformer architecture has been instrumental in driving Deep Learning (DL) based Natural Language Processing (NLP) progress since its introduction in

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

The Transformer architecture has been instrumental in driving Deep Learning (DL) based Natural Language Processing (NLP) progress since its introduction in 2017. In particular, it cracked the problem of how to apply transfer learning to NLP. This enabled using vast amounts of publicly available textual data to pretrain models before applying them to domain specific problems.

Over the past few years, models based on the Transformer architecture have scaled to ever larger problem sizes. Examples of such models are BERT, GPT-1/2/3, and NVIDIA's Megatron. Pretrained versions of these models are publicly available and can be used as-is to solve NLP tasks, or they can be further fine-tuned for the end application. Although the Transformer architecture has been around for almost five years, our impression is that it still seems mysterious to many developers. In this session we will address this by connecting the dots between basic neural language models and the Transformer architecture. We will also describe how NVIDIA’s Megatron implementation enables the Transformer to scale to a huge number of GPUs.

活动: GTC Digital Spring

日期: March 2022

行业: 所有行业

级别: 初级技术

话题: Deep Learning - Training

语言: 英语

所在地: