Very low voltage (VLV) design
Ramon Bertran, Pradip Bose, et al.
ICCD 2017
This letter presents a multi-TOPS AI accelerator core for deep learning training and inference. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing a dataflow architecture to provide high throughput and an on-chip scratchpad hierarchy to meet the bandwidth demands of the compute units. A custom 16b floating point (fp16) representation with 1 sign bit, 6 exponent bits, and 9 mantissa bits has also been developed for high model accuracy in training and inference as well as 1b/2b (binary/ternary) integer for aggressive inference performance. At 1.5 GHz, the AI core prototype achieves 1.5 TFLOPS fp16, 12 TOPS ternary, or 24 TOPS binary peak performance in 14-nm CMOS.
Ramon Bertran, Pradip Bose, et al.
ICCD 2017
Swagath Venkataramani, Vijayalakshmi Srinivasan, et al.
ISCA 2021
Azeez Bhavnagarwala, Stephen Kosonocky, et al.
VLSI Circuits 2007
Joshua Auerbach, David F. Bacon, et al.
DAC 2011