A software-assisted peak current regulation scheme to improve power-limited inference performance in a 5nm AI SoCMonodeep KarJoel Silbermanet al.2024ISSCC 2024
Approximate computing and the efficient machine learning expeditionJörg HenkelHai Liet al.2022ICCAD 2022
OnSRAM: Efficient Inter-Node On-Chip Scratchpad Management in Deep Learning AcceleratorsSubhankar PalSwagath Venkataramaniet al.2022Transactions on Embedded Computing Systems
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit QuantizationAndrea FasoliChia-Yu Chenet al.2022INTERSPEECH 2022
Accelerating DNN Training Through Selective Localized LearningSarada KrithivasanSanchari Senet al.2022Frontiers in Neuroscience
A 7-nm Four-Core Mixed-Precision AI Chip with 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware ThrottlingSae Kyu LeeAnkur Agrawalet al.2021IEEE JSSC
4-bit quantization of LSTM-based speech recognition modelsAndrea FasoliChia-Yu Chenet al.2021INTERSPEECH 2021
Efficacy of Pruning in Ultra-Low Precision DNNsSanchari SenSwagath Venkataramaniet al.2021ISLPED 2021
RaPiD: AI Accelerator for Ultra-Low Precision Training and InferenceSwagath VenkataramaniVijayalakshmi Srinivasanet al.2021ISCA 2021
17 Feb 2020US10565285Processor And Memory Transparent Convolutional Lowering And Auto Zero Padding For Deep Neural Network Implementations
MOMori OharaDeputy Director, IBM Research Tokyo, Distinguished Engineer, Chief SW Engineer for Hybrid Cloud on IBM HW