Scaling LLMs to Support 14 Million Users while Optimizing Performance and Accuracy
, Head of Data and AI , bunq
Learn how bunq implemented and scaled large language models (LLMs) to support its 14 million users, with a focus on delivering high-speed performance at such a large scale. We encountered several critical technical challenges, including optimizing vectorization, embeddings, reducing latency, improving retrieval processes, and refining prompt generation. This case study highlights the strategies we used to effectively scale LLMs in large-scale applications, utilizing NVIDIA NeMo Retriever and NIMs for enhanced performance and improved accuracy. We'll finish with a forward-looking view on what's next for Finn, offering insights into the future developments and innovations we plan to integrate to further enhance user experience and operational efficiency.
活动: GTC 25
日期: March 2025
NVIDIA 技术: Cloud / Data Center GPU,cuDF,NeMo,NVIDIA NIM
行业: 金融服务
话题: Generative AI - Retrieval-Augmented Generation (RAG)