The performance impact of exploiting branch ILP with tree representation of ILP code

Soo-Mook Moon; Kemal Ebcioǧlu

doi:10.1093/comjnl/41.1.26

Computer Journal

Paper

01 Jan 1998

The performance impact of exploiting branch ILP with tree representation of ILP code

View publication

Abstract

Modern single-CPU microprocessors exploit instruction-level parallelism (ILP) by deriving their performance advantage mainly from parallel execution of ALU and memory instructions within a single clock cycle. This performance advantage obtained by exploiting data ILP is severely offset by sequential execution of conditional branches, especially in branch-intensive non-numerical code. Consequently, branch ILP must also be exploited by executing branches and data instructions in parallel. This requires compilation support for scheduling branches as well as architectural support for executing branches and data instructions in the same cycle. This paper performs a comprehensive empirical study aimed at evaluating the performance impact of exploiting branch ILP using a representation of ILP code called tree representation, which has been proposed by Nicolau [A. Nicolau (1985), Technical Report TR-85-678, Cornell University, Ithaca, NY] and Ebcioglu to exploit branch ILP in the most generalized form. Our results indicate that exploiting branch ILP can enhance performance substantially (i.e., as much as a geometric mean of speedup 4.5 in the 16 -ALU machine, compared to the base speedup 3.0) and that the performance benefit comes not only from the intended parallel execution but from the decrease of useless speculative execution due to earlier scheduling of branches.

Conference paper