Olivier Maher, N. Harnack, et al.
DRC 2023
Analog Non-Volatile Memory-based accelerators offer high-throughput and energy-efficient Multiply-Accumulate operations for the large Fully-Connected layers that dominate Transformer-based Large Language Models (LLMs). We describe recent chip-demo and architectural efforts, quantify the unique benefits of Fully- (rather than Partially-) Weight-Stationary systems, and discuss factors affecting latency of token-processing and generation.
Olivier Maher, N. Harnack, et al.
DRC 2023
Tommaso Stecconi, Roberto Guido, et al.
Advanced Electronic Materials
Max Bloomfield, Amogh Wasti, et al.
ITherm 2025
Xiaofan Zhang, Haoming Lu, et al.
MLSys 2020