Name: Optimizing Large Language Models: An Experimental Approach to Pruning and Fine-Tuning LLama2 7B S62348 | GTC 2024 | NVIDIA On-Demand
Uploaded: 2024-03-20T16:00:00Z
Duration: 2824 s
Description: In the face of high computational demands from large language models (LLMs), we present an experimental approach to model pruning and fine-tuning to overco

Video Player is loading.

Current Time 0:00

Duration 47:04

Loaded: 0%

Stream Type LIVE

Remaining Time 47:04

详情

字幕

In the face of high computational demands from large language models (LLMs), we present an experimental approach to model pruning and fine-tuning to overcome these resource challenges. Navigate through our systematic process of turning a 7 billion parameter LLama2 model into a practical 1.5 billion parameter variant. Learn the iterative sequence for layer removal and realignment, backed by extensive experimentation on truncation techniques. We demonstrate how these compact models can serve as efficient drafters, providing rapid responses while their larger versions address more sophisticated tasks.

活动: GTC 24

日期: March 2024

级别: 高级技术

行业: 所有行业

NVIDIA 技术: 云/数据中心 GPU

话题: MLOps

语言: 英语

所在地: