Architectures and Circuits for Analog-memory-based Hardware Accelerators for Deep Neural Networks
Abstract
Analog non-volatile memory (NVM)-based accelerators for Deep Neural Networks (DNNs) can achieve high-throughput and energy-efficiency by computing multiply-accumulate (MAC) operations using Ohm’s law and Kirchhoff’s current law on arrays of resistive memory devices [1]. In recent years, energy-efficient, weight-stationary MAC operations in analog NVM memory-array “Tiles” were demonstrated in hardware with Phase Change Memory (PCM) devices integrated in the backend of 14-nm CMOS [2, 3]. Competitive end-to-end DNN accuracies can be obtained with the help of hardware aware training, accurate weight programming, and sufficiently linear MAC operations in the analog domain [4]. In this paper, I describe architectural and circuit advances for such Analog NVM-based accelerators and specialized digital compute units, designed to accelerate Transformer, Long- Short-Term-Memory (LSTM), and Convolution Neural Networks (CNNs). A highly heterogeneous and programmable accelerator architecture that takes advantage of a dense and efficient circuit-switched 2D mesh to exchange vectors of neuron-activation over short distances in a massively parallel fashion [5] is presented. Based on a 14-nm inference chip consisting of multiple arrays of PCM devices, the impact of memory materials on the accuracy and performance of these systems will be discussed. The author would like to thank colleagues at IBM Research Almaden, Yorktown, Albany, Zurich and Tokyo for their contributions to this work and the IBM Research AI HW Center. [1] G. W. Burr et al. “Ohm’s Law + Kirchhoff’s Current Law = Better AI: Neural- Network Processing Done in Memory with Analog Circuits will Save Energy”. In: IEEE Spectrum 58.12 (2021), pp. 44–49. [2] P. Narayanan et al. “Fully on-chip MAC at 14nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration format”. In: Symposium on VLSI Technology. 2021. [3] M. Le Gallo et al. “A 64-core mixed-signal in-memory compute chip based on phase- change memory for deep neural network inference”. In: arXiv:2212.02872 (2022). [4] M. J. Rasch et al. “Hardware-aware training for largescale and diverse deep learning inference workloads using in-memory computing-based accelerators”. In: arXiv preprint arXiv:2302.08469 (2023). [5] S. Jain et al. “A Heterogeneous and Programmable Compute-In-Memory Accelerator Architecture for Analog-AI Using Dense 2-D Mesh”. In: IEEE Trans. VLSI 31.1 (2023), pp. 114–127.