About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
FPL 2020
Conference paper
Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack
Abstract
Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant engineering costs. Programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA configurable fabric. We propose an overlay (?-VTA) and an optimization method guided by agile-inspired auto-tuning techniques. We achieve higher performance of up to 2.5x and faster convergence of up to 8.1x.