About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IEEE Micro
Paper
Data Movement Accelerator Engines on a Prototype Power10 Processor
Abstract
This paper presents the design and implementation of Active Messaging Engines (AMEs) on an IBM Power10 prototype chip. AMEs are tiny, simple, but fully programmable 64-bit processors, for offloading operations related to data movement. AMEs can offload execution flow of MPI and other messaging stacks from the host CPU, enabling truly asynchronous progress to overlap computation and communication. The AMEs are implemented as on-board OpenCAPI-compliant accelerators, leveraging existing OpenCAPI infrastructure. As realized in a 7 nm technology, each AME takes 0.034 mm2 of silicon area and 4.1 mW of power. AME performance is evaluated across several contiguous and non-contiguous memory copy scenarios. AMEs can perform up to the bandwidth limit of their access path to the main memory (32 GB/s) and incur a per-request overhead of about 600 ns. These results indicate that AMEs will confer advantages to general messaging libraries for processing, sending, and receiving on-node and off-node messages.