Name: Scaling LLMs to Support 14 Million Users while Optimizing Performance and Accuracy S72674 | GTC 2025 | NVIDIA On-Demand
Uploaded: 2025-03-19T10:00:00Z
Duration: 2374 s
Description: Learn how bunq implemented and scaled large language models (LLMs) to support its 14 million users, with a focus on delivering high-speed performance at su

详情

字幕

Learn how bunq implemented and scaled large language models (LLMs) to support its 14 million users, with a focus on delivering high-speed performance at such a large scale. We encountered several critical technical challenges, including optimizing vectorization, embeddings, reducing latency, improving retrieval processes, and refining prompt generation. This case study highlights the strategies we used to effectively scale LLMs in large-scale applications, utilizing NVIDIA NeMo Retriever and NIMs for enhanced performance and improved accuracy. We'll finish with a forward-looking view on what's next for Finn, offering insights into the future developments and innovations we plan to integrate to further enhance user experience and operational efficiency.

活动: GTC 25

日期: March 2025

NVIDIA 技术: Cloud / Data Center GPU,cuDF,NeMo,NVIDIA NIM

行业: 金融服务

话题: Generative AI - Retrieval-Augmented Generation (RAG)

级别: Technical – Intermediate

语言: 英语

所在地: