Name: Navigating Challenges and Technical Debt in LLMs Deployment S71116 | GTC 2025 | NVIDIA On-Demand
Uploaded: 2025-03-20T11:00:00Z
Duration: 1804 s
Description: Deploying large language models (LLMs) in production introduces numerous challenges and technical debt including memory management, parallelization, and mo

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

Deploying large language models (LLMs) in production introduces numerous challenges and technical debt including memory management, parallelization, and model compression, highlighting a trade-off between utility and complexity.

Learn how Mastercard developed a generative AI assistant to augment LLMs with Mastercard-specific data, ensuring business-required latency and throughput and operating on-premises using NVIDIA DGX, NeMo, and Triton inference servers. This technical initiative is detailed in their recently accepted ACM paper titled "Navigating Challenges and Technical Debt in Large Language Models Deployment" (https://dl.acm.org/doi/10.1145/3642970.3655840) co-authored in collaboration with the University of Edinburgh.

This talk will discuss the paper and the technical debts associated with deploying LLMs for businesses, highlighting the need for solutions and sophisticated engineering strategies that go beyond the capabilities of standard machine learning libraries and inference engines.

活动: GTC 25

日期: March 2025

行业: 金融服务

级别: 通用

话题: Generative AI - Text Generation

NVIDIA 技术: NeMo,Triton

语言: 英语

所在地: