Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected
      • Quality

      Bridging the Gap Between Basic Neural Language Models, Transformers, and Megatron

      , Senior Deep Learning Scientist, NVIDIA
      , Director, Architecture, NVIDIA
      The Transformer architecture has been instrumental in driving Deep Learning (DL) based Natural Language Processing (NLP) progress since its introduction in 2017. In particular, it cracked the problem of how to apply transfer learning to NLP. This enabled using vast amounts of publicly available textual data to pretrain models before applying them to domain specific problems.

      Over the past few years, models based on the Transformer architecture have scaled to ever larger problem sizes. Examples of such models are BERT, GPT-1/2/3, and NVIDIA's Megatron. Pretrained versions of these models are publicly available and can be used as-is to solve NLP tasks, or they can be further fine-tuned for the end application. Although the Transformer architecture has been around for almost five years, our impression is that it still seems mysterious to many developers. In this session we will address this by connecting the dots between basic neural language models and the Transformer architecture. We will also describe how NVIDIA’s Megatron implementation enables the Transformer to scale to a huge number of GPUs.
      活动: GTC Digital Spring
      日期: March 2022
      行业: 所有行业
      级别: 初级技术
      话题: Deep Learning - Training
      语言: 英语
      所在地: