Analog memory-based techniques for accelerating the training of fully-connected deep neural networks
Abstract
Crossbar arrays of resistive non-volatile memories (NVMs) offer a novel and innovative solution for deep learning tasks which are typically implemented on GPUs [1]. The highly parallel structure employed in these architectures enables fast and energy-efficient multiply-accumulate computations, which is the workhorse of most deep learning algorithms. More specifically, we are developing analog hardware platforms for acceleration of large Fully Connected (FC) Deep Neural Networks (DNNs) [1,2], where training is performed using the backpropagation algorithm. This algorithm is a supervised form of learning based on three steps: forward propagation of input data through the network (a.k.a. forward inference), comparison of the inference results with ground truth labels and backpropagation of the errors from the output to the input layer, and then in-situ weight updates. This type of supervised training has been shown to succeed even in the presence of a substantial number of faulty NVMs, relaxing yield requirements vis-à-vis conventional memory, where near 100% yield may be required [2]. We recently surveyed the use of analog memory devices for DNN hardware accelerators based on crossbar array structures and discussed design choices, device and circuit readiness, and the most promising opportunities compared digital accelerators [3]. In this presentation, we will focus on our implementation of an analog memory cell based on Phase-Change Memory (PCM) and 3-Transistor 1-Capacitor (3T1C) [4]. Software-equivalent accuracy on various datasets (MNIST, MNIST with noise, CIFAR-10, CIFAR-100) was achieved in a mixed software-hardware demonstration with DNN weights stored in real PCM device arrays as analog conductances. We will discuss how limitations from real-world non-volatile memory (NVM), such as conductance linearity and variability affects DNN training and how using two pairs of analog weights with varying significance relaxes device requirements [5, 6, 7]. Finally, we summarize all pieces needed to build an analog accelerator chip [8] and how lithography plays a role in future development of novel NVM devices. References: [1] G. W. Burr et al., “Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element” IEDM Tech. Digest, 29.5 (2014). [2] G. W. Burr et al., “Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses) using phase-change memory as the synaptic weight element”, IEEE Trans. Elec. Dev, 62(11), pp. 3498 (2015). [3] H. Tsai et al., “Recent progress in analog memory-based accelerators for deep learning”, Journal of Physics D: Applied Physics, 51 (28), 283001 (2018) [4] S. Ambrogio et al., “Equivalent-Accuracy Accelerated Neural Network Training using Analog Memory”, Nature, 558 (7708), 60 (2018). [5] T. Gokmen et al., “Acceleration of deep neural network training with resistive cross-point devices: design considerations”, Frontiers in neuroscience, 10, 333 (2016). [6] S. Sidler et al., “Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: impact of conductance response”, ESSDERC Proc., 440 (2016). [7] G. Cristiano et al., “Perspective on Training Fully Connected Networks with Resistive Memories: Device Requirements for Multiple Conductances of Varying Significance”, accepted in Journal of Applied Physics (2018). [8] P. Narayanan et al., “Toward on-chip acceleration of the backpropagation algorithm using nonvolatile memory”, IBM J. Res. Dev., 61 (4), 1-11 (2017).