Name: Optimize Generative AI inference with Quantization in TensorRT-LLM and TensorRT S63213 | GTC 2024 | NVIDIA On-Demand
Uploaded: 2024-03-18T09:30:00Z
Duration: 4597 s
Description: Because running inference of AI models at large scale is computationally costly, optimization techniques are crucial to lower the inference cost

This site requires Javascript in order to view all its content. Please enable Javascript in order to access all the functionality of this web site. Here are the instructions how to enable JavaScript in your web browser.