Analog-memory-based 14nm Hardware Accelerator for Dense Deep Neural Networks including Transformers
Abstract
Analog non-volatile memory (NVM)-based accelerators for deep neural networks perform high-throughput and energy-efficient multiply-accumulate (MAC) operations (e.g., high TeraOPS/W) by taking advantage of massively parallelized analog MAC operations, implemented with Ohm's law and Kirchhoff's current law on array-matrices of resistive devices. While the wide-integer and floating-point operations offered by conventional digital CMOS computing are much more suitable than analog computing for conventional applications that require high accuracy and true reproducibility, deep neural networks can still provide competitive end-to-end results even with modest (e.g., 4-bit) precision in synaptic operations. In this paper, we describe a 14-nm inference chip, comprising multiple 512 times 512 arrays of Phase Change Memory (PCM) devices, which can deliver software-equivalent inference accuracy for MNIST handwritten-digit recognition and recurrent LSTM benchmarks, by using compensation techniques to finesse analog-memory challenges such as conductance drift and noise. We also project accuracy for Natural Language Processing (NLP) tasks performed with a state-of-art large Transformer-based model, BERT, when mapped onto an extended version of this same fundamental chip architecture.