How to Design an AI Supercomputer for Fast Distributed Training, and its Use Cases

, Senior AI Platform Architect, NEC
NEC began operating its AI supercomputer in March 2023 with 580 PFLOPS at half precision, the largest of any Japanese company. We'll discuss the design of an AI supercomputer for fast distributed training and specific use cases of AI supercomputers in industry. Specifically, we'll explain how to design low-latency and high-bandwidth networks, how to place NICs in a server to use RoCE v2, and GPU features such as TF32/BF16 from a computing architecture perspective. We'll also introduce use cases of what kind of AI research is being used by this supercomputer. NEC has multiple AI research laboratories and hundreds of AI researchers using AI supercomputers. The research areas include image and video recognition, language and semantic understanding, data analysis, predictive forecasting, optimal planning and control, and many other areas.
活动: GTC Digital Spring
日期: March 2023
级别: 高级技术
话题: HPC - Supercomputing
行业: HPC / Supercomputing
语言: 英语
话题: HPC
所在地: