Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected
      • Quality

      Scaling and Optimizing Your LLM Pipeline for End-to-End Efficiency

      , Product Manager, AI on Google Kubernetes Engine, Google
      , Cloud Architect, Google
      Are you having trouble getting language models (LLMs) to work in your organization? You're not alone. We'll look at how to deploy an open-source language model on GKE. We'll show data scientists and machine learning engineers how to use NeMo and TRT LLM with GKE's notebooks. Plus, GKE has a unique ability to help orchestrate AI workloads with efficiency and convenience. We'll also demonstrate how to train and tune a language model using NeMo and do a live technical demo of how data science teams can infer these models on GPUs with TRT LLM and GKE.
      活动: GTC 24
      日期: March 2024
      话题: AI 推理
      行业: 所有行业
      级别: 初级技术
      NVIDIA 技术: 云/数据中心 GPU
      语言: 英语
      所在地: