Optimization of Projected Phase Change Memory for Analog In-Memory Computing Inference
Abstract
Phase change memory (PCM) is a promising candidate for non-von Neumann based analog in-memory computing – particularly for inference of previously-trained Deep Neural Networks. with projection liner is designed for resistance drift mitigation. We show that PCM electrical properties-including resistance values, memory window, resistance drift, read noise, can be tuned systematically using the liner in the manufacturable mushroom PCM. We perform a systematic study of these electrical properties and their impact on the accuracy of several deep neural networks (DNN) using the analog AI simulation tool developed at IBM. We show that the DNN accuracy can be improved by the PCM with liner. We analyze the origin of the accuracy improvement and identify the design space for best performance. We evaluate large neural networks with tens of millions of weights using the PCM with and without liner, and evaluate a variety of DNNs and test datasets at various times after programming, to study the network performance over time for chips using these PCMs. We evaluate PCM devices in various DNN types, including Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Transformer-based networks. We also evaluate the devices using various weight mapping schemes, including a direct weight mapping scheme for one PCM per weight and an optimized weight mapping scheme using multiple PCMs per weight. We show that the accuracy enhancements from PCM with a projection liner are achieved for all these weight mapping schemes as well as for networks with different structure, complexity, type of nonlinear activation functions employed, etc. We also show that the accuracy is improved for both short term and long term after programming. The better long term accuracy of the liner devices is due to the lower drift coefficient and lower drift variability. The better initial accuracy is due to the reduced noise of the liner devices, despite a trade-off of a reduced memory window. We show that the liner device parameters need to be carefully chosen and identify a range of these parameters that enable the most improvement in network accuracy.