Experiences of using a dependence profiler to assist parallelization for multi-cores
Abstract
In this work we show how to use a data-dependence profiling tool called DProf, which can be utilized to assist parallelization for multi-core systems. DProf is based on an optimizing compiler and uses reference runs to emit information on runtime dependences between various memory accesses within a loop. The profiler not only marks the dependent statements and the accesses but also emits details regarding the percentage of time the dependences are encountered - the percentage being taken over the loop iteration size. Though DProf has been primarily built to capture opportunities for speculative thread-level parallelism, it has been found that the report generated by the DProf can be utilized very effectively in detecting and parallelizing complex code. To demonstrate this, we have taken two complex benchmarks - 435.gromacs and 437.leslie3d from the SPECfp CPU2006 suite and show how they can be parallelized effectively using DProf as an assist. To the best of our knowledge none of the existing parallelizing compilers can detect and parallelize all the instances reported in this work. We find that by using DProf we are able to parallelize these benchmarks very effectively for IBM P5+ and P6 multi-core systems. We have parallelized the benchmarks using OpenMP leading to speedups up to 2.6x. Also, due to the detailed reporting by DProf, we could cut down on our parallelization development effort significantly by concentrating on portions of the code that require attention. DProf can also be used to identify applications, where applying parallelization may lead to regression in performance. This allows the developers to discard applications or parts of it quickly, which may not lead to performance improvements when deployed on multi-cores. Thus, a data dependence profiler like DProf can act as an excellent assist mechanism to move applications to multi-cores in an effective way.© 2010 IEEE.