About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Conference paper
Impact of inter-application contention in current and future HPC systems
Abstract
Fat-tree networks are the most popular topology among indirect networks in today's supercomputers. Current supercomputers are generally operated in a shared environment under the control of a job scheduler, executing many parallel applications simultaneously. The competition between these applications to use the same network resources causes a degradation in the applications' performance. The application that has to wait for the network resources occupied by another application's messages is said to be experiencing inter-application contention. The extent of degradation caused by inter-application contention is known to depend on multiple factors: the network topology, the routing scheme, the task-placement, etc. Note that these factors also affect intra-application contention. Our work evaluates the impact of inter-application contention for actual competing HPC workloads under different routing schemes in slimmed fat trees. In contrast with previous works, which focus mostly on individual application's performance, we take a more system-centric view. Our work estimates the amount of system performance loss that inter-application contention contributes in current HPC systems, which we have measured to be around a 10%. We also present a projection of the impact of inter-application contention in the near and mid-term future HPC systems, scaling the node computational power and network link speeds to foreseeable values. Our projection for future HPC systems shows that inter-application contention can cause a 15% throughput loss even with link speeds of 40 Gb/s for some application mixes. The difference in impact on a chosen application when running within different mixes leads to the performance variability described in previous works, but our work sets a better bound on the variability than studies performed with an injection of network noise. Finally, we found a high correlation between the communication volume of the applications in a workload and the amount of inter-application contention they experience. © 2010 IEEE.