Scheduling workflows in multi-cluster environments
Abstract
Scientific applications modeled as workflows can exhibit both task and data parallelism. Scheduling these workflows in a multi-cluster environment is challenging due to the large number of task mapping possibilities. Therefore, several heuristics have been proposed over the last years to address such a problem. A key limitation of existing heuristics for multi-cluster environments is that individual tasks are mapped onto single resources, which limits the resource options to reduce the time to the complete workflow executions. This paper introduces the Multi-Cluster Allocation-Heterogeneous Earliest Finish Time (MCA-HEFT) heuristic, which deploys single parallel tasks of a workflow into multiple clusters and schedules them accordingly. We evaluated MCA-HEFT against the Mixed-parallel Heterogeneous Earliest Finish Time (M-HEFT) heuristic, which is one of the most well-known workflow scheduling heuristics in literature. MCA-HEFT was able to produce make spans that were up to 42% shorter than those produced by M-HEFT, having only approximately 10% of tasks distributed on multiple clusters. Our experiments considered several metrics and parameters including critical path size, make span, number of clusters used to execute tasks, and the network impact when deploying the tasks in multiple clusters. © 2013 IEEE.