Abstract
Applications running on multicore platforms are difficult to program, and even more difficult to optimize, mainly due to (1) the several layers where the optimizations occur and (2) the multitude of available resources to be exploited in parallel. Although low-level optimizations only target code running on individual cores, high-level optimizations (e.g. data- and task-parallelism) target the overall application performance. In this paper, we focus on the latter, by evaluating possible mapping scenarios of a real application on a heterogeneous multicore processor. Specifically, we focus on analyzing the impact of combining data- and task-parallelism for a multimedia analysis application running on the Cell Broadband Engine (Cell/B.E.). We find that both low-level and high-level optimizations are important for the overall application speed-up. However, we show that a speed-up factor of over 20 for the application running on Cell/B.E. can only be obtained if core utilization is increased by combining data- and task-parallelism. Thus, we consider this case study essential for building expertise in both application optimization and performance analysis for multicore platforms. Copyright © 2008 John Wiley & Sons, Ltd.