Name: Optimizing and Scaling LLMs With TensorRT-LLM for Text Generation S61775 | GTC 2024 | NVIDIA On-Demand
Uploaded: 2024-03-20T14:00:00Z
Duration: 6729 s
Description: The landscape of large language models (LLMs) is evolving quickly

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

The landscape of large language models (LLMs) is evolving quickly. With model parameters and size increasing, optimizing and deploying LLMs for inference gets very complex. This requires a framework with better API support for easy extension, where there is little emphasis on memory management or CUDA calls. Learn how we used NVIDIA’s suite of solutions for optimizing LLM models and deploying in multi-GPU environments.

活动: GTC 24

日期: March 2024

话题: AI 推理

行业: Consumer Internet

级别: 中级技术

NVIDIA 技术: TensorRT

语言: 英语

所在地: