Self-contained, accurate precomputation prefetching
Abstract
This work revisits precomputation prefetching targeting long access latency loads with access patterns that are hard to predict. It presents Ekivolos, a precomputation prefetcher system that automatically builds prefetching slices that contain enough control flow instructions to faithfully and autonomously recreate the program's access behavior without inducing monitoring and execution overhead on the main thread. Ekivolos departs from the traditional notion of creating optimized short slices. In contrast, it shows that even longer slices can run ahead of the main thread and perform useful prefetches as long as they are sufficiently accurate. Ekivolos operates on arbitrary application binaries and takes advantage of the observed execution paths in creating its slices. On a set of emerging workloads Ekivolos outperforms three state-of-the-art hardware prefetchers and previously proposed precomputation-based prefetchers.