Video Player is loading.
Current Time 0:00
Duration 47:04
Loaded: 0%
Stream Type LIVE
Remaining Time 47:04
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected
    • default, selected

    Optimizing Large Language Models: An Experimental Approach to Pruning and Fine-Tuning LLama2 7B

    , ML Engineer, Weights & Biases
    In the face of high computational demands from large language models (LLMs), we present an experimental approach to model pruning and fine-tuning to overcome these resource challenges. Navigate through our systematic process of turning a 7 billion parameter LLama2 model into a practical 1.5 billion parameter variant. Learn the iterative sequence for layer removal and realignment, backed by extensive experimentation on truncation techniques. We demonstrate how these compact models can serve as efficient drafters, providing rapid responses while their larger versions address more sophisticated tasks.
    活动: GTC 24
    日期: March 2024
    级别: 高级技术
    行业: 所有行业
    NVIDIA 技术: 云/数据中心 GPU
    话题: MLOps
    语言: 英语
    所在地: