Publication
ICDCS 1997
Conference paper
Extensible resource management for cluster computing
Abstract
Advanced general-purpose parallel systems should be able to support diverse applications with different resource requirements without compromising effectiveness and efficiency. We present a new resource management model for cluster computing that allows multiple scheduling policies to co-exist dynamically. In particular, we have built Octopus, an extensible and distributed hierarchical scheduler that implements new space-sharing, gang-scheduling and load-sharing strategies. A series of experiments performed on an IBM SP2 suggest that Octopus can effectively match application requirements to available resources, and improve the performance of a variety of parallel applications within a cluster.