Compiler techniques for data prefetching on the powerPC
Abstract
Recently announced superscalar processors feature multiple functional units which, coupled with high clock rates, reduce the execution time of programs quite significantly. Memory technology, however, is still lagging behind, with memory components running at typically half or one third the clock rate of current processors. Consequently, a bottleneck has formed at the CPU/main-memory interface, reducing the ability of processors to exploit the fine grain parallelism offered by programs. In this paper we demonstrate how data prefetching algorithms can be used by the compiler to cope with large memory latencies. The prime candidates for improvement by prefetching techniques are numeric programs which typically operate on large amounts of data following regular access patterns. We take advantage of a special touch instruction in the PowerPC architecture, which may be used to load data into the cache before the data are actually required. By using this instruction cleverly it is possible to overlap useful computation with data transfer operations, effectively eliminating the cost associated with the otherwise unavoidable data cache misses. A prototype implementation within the IBM XL compiler for IBM RISC System/6000 is described in the paper. Its performance evaluation was carried out using the SPECfp92 programs. We report on preliminary results which show there are programs in the SPECfp92 suite that can be significantly improved as a result of use of the touch instruction.