About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
NeurIPS 2020
Conference paper
Ultra-Low Precision 4-bit Training of Deep Neural Networks
Abstract
In this paper, we propose a number of novel techniques and numerical representation formats that enable, for the very first time, the precision of training systems to be aggressively scaled from 8-bits to 4-bits. To enable this advance, we explore a novel adaptive Gradient Scaling technique (Gradscale) that addresses the challenges of insufficient range and resolution in quantized gradients as well as explores the impact of quantization errors observed during model training. We theoretically analyze the role of bias in gradient quantization and propose solutions that mitigate the impact of this bias on model convergence. Finally, we examine our techniques on a spectrum of deep learning models in computer vision, speech, and NLP. In combination with previously proposed solutions for 4-bit quantization of weight and activation tensors, 4-bit training shows a non-significant loss in accuracy across application domains while enabling significant hardware acceleration (> 7X over state-of-the-art FP16 systems).