How to Build a Robust Platform for Real-Time Inference: Learnings from Michelangelo

, Principal Software Engineer, Uber
, Engineering Manager, Uber
Machine learning (ML) models are increasingly used to make business-critical decisions at Uber, ranging from content discovery and recommendations to estimated time of arrivals (ETAs). In recent years, we've seen the adoption of deep learning and other innovations to further unlock the value of big data. However, the tight model inference latency requirements for Michelangelo — Uber’s centralized end-to-end ML platform — haven't relaxed, even though model size and complexity have increased. NVIDIA’s Triton Inference Server is an open-source inference serving software that simplifies inference with support for different deep learning backends and on both CPU and GPU. We'll share our learnings of integrating Triton into the Michelangelo serving stack to achieve low latency and high throughput for deep learning use cases.
活动: GTC Digital Spring
日期: March 2023
行业: 汽车 / 运输
话题: Deep Learning - Inference
级别: 中级技术
语言: 英语
话题: Deep Learning
所在地: