Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected

      Build Next-Gen Agents With Large Vision Language Models

      , Senior Solutions Architect , NVIDIA
      , Product Marketing Manager, NVIDIA
      , Technical Marketing Engineer, NVIDIA
      Vision-language models (VLM) are taking computer vision by storm, offering scalability and robust zero-shot solution for countless industries. In this lab, you will learn how to use VLM NIMs and the Video Search and Summarization AI Blueprint (VSS). You will get hands on experience calling VLM NIM APIs and build a video understanding agent. Then you will dive into the inner workings of VSS to learn how VLMs, LLMs and the latest Graph RAG techniques are used to build a powerful agent capable of summarizing and answering questions over long videos.
      Prerequisite(s):

      Python and Computer Vision.
      活动: GTC 25
      日期: March 2025
      行业: 所有行业
      NVIDIA 技术: Cloud / Data Center GPU,DeepStream,TensorRT,NeMo,Video Storage Toolkit (VST),NVIDIA NIM
      级别: 通用
      话题: Models / Libraries / Frameworks - Vision Language Models (VLMs)
      语言: 英语
      所在地: