Sarada Krithivasan, Sanchari Sen, et al.
Frontiers in Neuroscience
Discrete AI inference cards, operating under form-factor and system-defined peak power constraints, must serve diverse inference requests with widely varying power consumption. A peak current-limiting scheme is proposed to maximize inference performance across practical use cases. The peak current management block consists of a card-level current sensing circuit with an AI inference-aware feed-forward and feedback control mechanism. The card-level sensing improves performance by eliminating the need for additional margins for power consumed by off-chip components. Compiler-assisted feed-forward control exploits the predictability of AI inferences and proactively manages peak currents without a static reduction in operating frequency. Measurements from an AI system on chip (SoC), fabricated in 5-nm technology, show up to 41% improvement in Bert-Large inference throughput by engaging the peak current control.
Sarada Krithivasan, Sanchari Sen, et al.
Frontiers in Neuroscience
Sarada Krithivasan, Sanchari Sen, et al.
ISLPED 2019
Michael Scheuermann, Shurong Tian, et al.
3DIC 2016
Andrea Fasoli, Chia-Yu Chen, et al.
INTERSPEECH 2021