In-memory computing

Overview

In-memory computing (IMC) is an emerging non-von Neumann computational paradigm that keeps alive the promise of achieving energy efficiencies on the order of one femtoJoule per operation in a computing system. The key idea is to perform certain computational tasks in place in memory, thereby obviating the need to shuttle data back and forth between the processing and memory units. The time and energy cost associated with this data movement is by far the most severe roadblock for modern computing systems.

IMC is often achieved by exploiting the physical attributes of the memory devices, their array-level organization, etc. IMC has found application in a range of applications such as scientific computing, database query, machine learning etc. However, the most promising application for IMC is for efficient realization of deep neural networks (DNNs) that have revolutionized AI in recent years. A key challenge for DNNs is its computational inefficiency. In fact, the lack of sufficient compute power was one of the key factors that held back progress in the field for almost 30 years. More specialized hardware such as graphical processing units and application specific ICs have had a significant impact on the hardware realization of DNNs and helped unleash the recent AI revolution. However, even with better hardware, there is a significant energy gap to be bridged to unleash the true potential of AI.

In our group we are exploring custom accelerators for DNNs based on IMC. The essential idea is to realized attributes such as synaptic efficacy and plasticity in place in the memory itself by exploiting the physical attributes of memory devices. Physically stationary neurons and synapses along with lower precision analog computation are key salient features of biological neural networks and hence this work also comes under the purview of neuromorphic computing.
We have designed and fabricated, in collaboration with other members of the IBM Research AI Hardware Center, the most advanced IMC compute chips or Analog AI chips for deep learning to date.

We have assembled a highly multidisciplinary team that tackles the entire stack namely, the memory technology, mixed-signal circuit design, system-level architecture, the software stack and applications. As far as memory technology is concerned, our primary focus is improving the compute density, compute precision and weight capacity. There is also substantial research effort aimed at improving the energy efficiency of the peripheral circuitry associated with the IMC cores. The overall system-level architecture of a multi-core IMC chip as well as the appropriate communication fabric is being actively researched. The software stack that spans the hardware-aware training libraries down to the compiler and is being actively developed. Finally, in the application front, we explore neural architectures that are particularly well-suited for this new kind of accelerators. We are also exploring applications that transcend conventional deep learning inference such as more bio-realistic DNNs, neuro-vector symbolic architectures, etc.