Name: From Zero to Millions: Scaling Large Language Model Inference With TensorRT-LLM S63173 | GTC 2024 | NVIDIA On-Demand
Uploaded: 2024-03-20T16:00:00Z
Duration: 1465 s
Description: We'll give an overview of how we successfully utilized TensorRT-LLM to deploy large language models at scale, thereby supporting millions of users at Perpl

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

We'll give an overview of how we successfully utilized TensorRT-LLM to deploy large language models at scale, thereby supporting millions of users at Perplexity.

活动: GTC 24

日期: March 2024

级别: 高级技术

行业: Consumer Internet

NVIDIA 技术: CUDA,cuDNN,Hopper,NVLink / NVSwitch

话题: Text Generation

语言: 英语

所在地: