Fan Zhang, Junwei Cao, et al.
IEEE TETC
Memory access latency is often a crucial performance limitation for high performance computing. Prefetching is one of the strategies used by system designers to bridge the processor-memory gap. This paper describes a new innovative list prefetching feature introduced in the IBM Blue Gene/Q supercomputer. The list prefetcher records the L1 cache miss addresses and prefetches them in the next iteration. The evaluation shows this list prefetching mechanism reduces data fetching time when L1 cache misses happen and improves the performance for high performance computing applications with repeating nonuniform memory access patterns. Its performance is compatible with classic stream prefetcher when properly configured. © 2012 IEEE.
Fan Zhang, Junwei Cao, et al.
IEEE TETC
Rajeev Gupta, Shourya Roy, et al.
ICAC 2006
David S. Kung
DAC 1998
Kafai Lai, Alan E. Rosenbluth, et al.
SPIE Advanced Lithography 2007