High performance adaptive distributed scheduling algorithm
Abstract
Exascale computing requires complex runtime systems that need to consider affinity, load balancing and low time and message complexity for scheduling massive scale parallel computations. Simultaneous consideration of these objectives makes online distributed scheduling a very challenging problem. Prior distributed scheduling approaches are limited to shared memory or primarily use work-stealing across distributed memory nodes for load-balancing or depend on the programmer specified affinity. However, the performance of affinity driven scheduling and work stealing based algorithms degrades when the input is irregular(UTS) and/or sparse. In this paper we present a novel adaptive distributed scheduling algorithm (ALDS) for multi-place parallel computations, that uses a unique combination of remote (inter-place) spawns and remote work steals to reduce the overheads in the scheduler, which helps to dynamically maintain load balance across the compute nodes of the system, while ensuring affinity maximally. Using parallel machine learning algorithms such as Support Vector Regression running concurrently with program execution on the target architecture, ALDS can automatically and adaptively tune the parameters for scalable performance. Our design was implemented using GASNet API and POSIX threads. For the UTS (Unbalanced Tree Search) benchmark (using up to 2048 nodes of Blue Gene/P), we deliver superior performance over Charm++ [1] and [2]. © 2013 IEEE.