Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected

      Optimizing and Scaling LLMs With TensorRT-LLM for Text Generation

      , Sr. Solution Architect , NVIDIA
      , Machine Learning Engineer, Grammarly
      , Machine Learning Engineer, Grammarly
      The landscape of large language models (LLMs) is evolving quickly. With model parameters and size increasing, optimizing and deploying LLMs for inference gets very complex. This requires a framework with better API support for easy extension, where there is little emphasis on memory management or CUDA calls. Learn how we used NVIDIA’s suite of solutions for optimizing LLM models and deploying in multi-GPU environments.
      活动: GTC 24
      日期: March 2024
      话题: AI 推理
      行业: Consumer Internet
      级别: 中级技术
      NVIDIA 技术: TensorRT
      语言: 英语
      所在地: