Publication
CCGrid 2012
Conference paper

GPU performance enhancement via communication cost reduction: Case studies of radix sort and WSN relay node placement problem

View publication

Abstract

As the computational power of Graphics Processing Unit (GPU) increases, data transmission becomes the major performance bottleneck. In this study, we investigate two techniques, data streaming and data compression, to reduce the communication cost on GPU. Data streaming enables overlap of communication and computation, whereas data compression reduces the data size transferred among different memory spaces. Although both techniques increase computation cost, overall performance can still be enhanced by reducing communication cost. We demonstrate the effectiveness of the two techniques via two case studies: radix sort and 3-star, a deployment algorithm in wireless sensor networks. For radix sort, a new algorithm, which mixes MSD and LSD algorithms and employs data streaming, is presented. Its performance is 25% faster than the fastest GPU radix sort implementation currently available in the public domain. For the 3-star algorithm, the speed increases several hundreds of times faster than that obtained by the CPU code. The data streaming and data compression, which is a hybrid CPU-GPU algorithm, provide an additional 54% performance improvement to the GPU implementation. Data compression not only reduces communication cost, but also improves the computation time, by which further performance enhancement can be achieved. © 2012 IEEE.

Date

Publication

CCGrid 2012