Architecture and Design Approaches to ML Hardware Acceleration: Performance Compute Environment

Leland Chang

ISSCC 2024

Short course

18 Feb 2024

Architecture and Design Approaches to ML Hardware Acceleration: Performance Compute Environment

Abstract

With the recent explosion in generative AI and large language models, hardware acceleration has become particularly important in high-performance compute environments. In such applications, AI accelerators should address a broad range of AI models and enable workflows spanning model pre-training, fine-tuning, and inference. System-level design and software co-optimization must be considered to balance compute and communication costs, especially with inference workloads driving aggressive latency targets and model size growth driving the use of distributed systems. This talk will discuss these considerations in the context of high-performance system deployments and explore approaches to AI accelerator circuit design as well as research roadmaps to improve both compute efficiency and communication bandwidth.

Conference paper