Extending Retrieval-Augmented Generation (RAG) to Multimodal Documents

, Sr. Content Developer, NVIDIA
高度评价

Explore more training options offered by the NVIDIA Deep Learning Institute (DLI). Choose from an extensive catalog of self-paced, online courses or instructor-led virtual workshops to help you develop key skills in AI, HPC, graphics & simulation, and more.
Ready to validate your skills? Get NVIDIA certified and distinguish yourself in the industry.

Semantic retrieval has become a common tool for helping to drive large language models toward data-grounded reasoning. However, naive techniques start to break down when non-textual inputs come into the picture. Instead of ignoring non-textual information, we’ll tackle multimodal document ingestion and retrieval. 

  • Construct the simple retrieval-augmented generation pipeline for context enrichment.
  • Consider approaches for reasoning with images to help an LLM-powered agent converse about image-dense research papers.
活动: Siggraph
日期: August 2024
话题: AI 推理
行业: 所有行业
级别: 初级技术
语言: 英语
所在地: