Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
详情
字幕
Navigating Challenges and Technical Debt in LLMs Deployment
, Vice President of AI Engineering, Mastercard
Deploying large language models (LLMs) in production introduces numerous challenges and technical debt including memory management, parallelization, and model compression, highlighting a trade-off between utility and complexity.
Learn how Mastercard developed a generative AI assistant to augment LLMs with Mastercard-specific data, ensuring business-required latency and throughput and operating on-premises using NVIDIA DGX, NeMo, and Triton inference servers. This technical initiative is detailed in their recently accepted ACM paper titled "Navigating Challenges and Technical Debt in Large Language Models Deployment" (https://dl.acm.org/doi/10.1145/3642970.3655840) co-authored in collaboration with the University of Edinburgh.
This talk will discuss the paper and the technical debts associated with deploying LLMs for businesses, highlighting the need for solutions and sophisticated engineering strategies that go beyond the capabilities of standard machine learning libraries and inference engines.