Brian Flachs, Shigehiro Asano, et al.
IBM J. Res. Dev
Discrete AI inference cards, operating under form-factor and system-defined peak power constraints, must serve diverse inference requests with widely varying power consumption. A peak current-limiting scheme is proposed to maximize inference performance across practical use cases. The peak current management block consists of a card-level current sensing circuit with an AI inference-aware feed-forward and feedback control mechanism. The card-level sensing improves performance by eliminating the need for additional margins for power consumed by off-chip components. Compiler-assisted feed-forward control exploits the predictability of AI inferences and proactively manages peak currents without a static reduction in operating frequency. Measurements from an AI system on chip (SoC), fabricated in 5-nm technology, show up to 41% improvement in Bert-Large inference throughput by engaging the peak current control.
Brian Flachs, Shigehiro Asano, et al.
IBM J. Res. Dev
Jinwook Oh, Sae Kyu Lee, et al.
VLSI Circuits 2020
Naigang Wang, Chi-Chun Liu, et al.
NeurIPS 2022
Swagath Venkataramani, Xiao Sun, et al.
Proceedings of the IEEE