EcTALK: Energy efficient coherent transprecision accelerators - The bidirectional long short-term memory neural network case
Abstract
Reduced-precision computing has recently emerged as a promising approach to improve energy efficiency by tolerating some loss of quality in computed results. A key barrier to a widespread adoption of reduced-precision computing is the lack of an architecture exploiting arbitrary precision, supported by a software layer that controls the precision of computations. In this paper we propose an end-to-end architecture to improve the energy efficiency by i) using an FPGA device for accelerating applications, ii) introducing flexible reduced-precision (transprecision) data-paths. The hardware implementation on FPGA leverages High-Level-Synthesis on top of IBM POWER81 CAPI (Coherent Accelerator Processor Interface) technology, which simplifies the software design for coupling accelerators to commodity processors. By applying the design to the BLSTM algorithm, it was possible to increase energy efficiency of an FPGA accelerator by up to 52x compared to that of a POWER81 processor, within an accuracy loss of only 0.6%.