Name: Faster Transformer + Triton：用于大型 NLP 模型推理的多节点实验方法 Faster Transformer Plus Triton: Experimental Approach on Multi-node for Giant NLP Model Inference S41612 | GTC Digital Spring 2022 | NVIDIA On-Demand
Uploaded: 2022-03-23T20:00:00Z
Duration: 1837 s
Description: With GPT-3, it's impossible to run the inference of the entire model on a single GPU. We must extend to multiple GPUs, or even multiple node serving

This site requires Javascript in order to view all its content. Please enable Javascript in order to access all the functionality of this web site. Here are the instructions how to enable JavaScript in your web browser.