Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected
      • Quality

      Navigating Challenges and Technical Debt in LLMs Deployment

      , Vice President of AI Engineering, Mastercard
      Deploying large language models (LLMs) in production introduces numerous challenges and technical debt including memory management, parallelization, and model compression, highlighting a trade-off between utility and complexity.

      Learn how Mastercard developed a generative AI assistant to augment LLMs with Mastercard-specific data, ensuring business-required latency and throughput and operating on-premises using NVIDIA DGX, NeMo, and Triton inference servers. This technical initiative is detailed in their recently accepted ACM paper titled "Navigating Challenges and Technical Debt in Large Language Models Deployment" (https://dl.acm.org/doi/10.1145/3642970.3655840) co-authored in collaboration with the University of Edinburgh.

      This talk will discuss the paper and the technical debts associated with deploying LLMs for businesses, highlighting the need for solutions and sophisticated engineering strategies that go beyond the capabilities of standard machine learning libraries and inference engines.
      活动: GTC 25
      日期: March 2025
      行业: 金融服务
      级别: 通用
      话题: Generative AI - Text Generation
      NVIDIA 技术: NeMo,Triton
      语言: 英语
      所在地: