Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
详情
字幕
Harnessing Generative AI and Large Language Model With Vision AI Agents
, Director, Software Engineering, NVIDIA
, Senior Deep Learning Engineer, NVIDIA
, Director of Product Management, NVIDIA
Petabytes of videos and images are generated by organizations using computer vision every day. Insights from the video can be used to identify concerns, boost productivity, improve safety, reduce downtime, and predict outcomes before they happen. Historically, operations teams have had to sift through videos and manually search for incidents – which is costly, relies on accurate metadata, and wholly inefficient.
Join us to learn how to unleash multi-modal models for instantly deriving business critical insights from videos and images. Multi-modal models will take search prompts from users as input, leveraging AI to immediately generate video and image results. This is a powerful tool, and these models can perform complex reasoning, correlate sequence of events, and understand when exactly an event is triggered and why. This video understanding ability can be used to solve real-world problems across industries, especially in factories, retail, and warehouses where environments are complex and logistically challenging.