Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected
      • Quality

      From Zero to Millions: Scaling Large Language Model Inference With TensorRT-LLM

      , Head of AI Inference, Perplexity AI
      We'll give an overview of how we successfully utilized TensorRT-LLM to deploy large language models at scale, thereby supporting millions of users at Perplexity.
      活动: GTC 24
      日期: March 2024
      级别: 高级技术
      行业: Consumer Internet
      NVIDIA 技术: CUDA,cuDNN,Hopper,NVLink / NVSwitch
      话题: Text Generation
      语言: 英语
      所在地: