Systematic Deep Learning Optimizations for Accurate and Efficient TensorRT Deployments on Cruise’s Self-Driving Fleet
, Staff Machine Learning Engineer, Cruise
Cruise has invested extensively in its deep learning (DL) optimization platform for systematic and composable optimizations such as quantization-aware training (QAT) and structured pruning, offering fast and accurate inference on NVIDIA GPUs making up the autonomous vehicle stack. We developed a DL compiler for graph mode QdQ insertions that utilize the new explicit precision QAT mode in TensorRT 8. We further extend the search for highly accurate quantized representations through techniques like trained quantization thresholds. In addition, the DL compiler is equipped to allow efficient and automated search for promising pruning configurations through one-shot techniques.