Autoscaling for hadoop clusters
Anshul Gandhi, Sidhartha Thota, et al.
IC2E 2016
MapReduce is a scalable parallel computing framework forbig data processing. It exhibits multiple processing phases,and thus an efficient job scheduling mechanism is crucial forensuring efficient resource utilization. This work studies thescheduling challenge that results from the overlapping of the"map" and "shuffle" phases in MapReduce. We propose anew, general model for this scheduling problem. Further,we prove that scheduling to minimize average response timein this model is strongly NP-hard in the offline case andthat no online algorithm can be constant-competitive in theonline case. However, we provide two online algorithms thatmatch the performance of the offline optimal when given aslightly faster service rate.
Anshul Gandhi, Sidhartha Thota, et al.
IC2E 2016
Weina Wang, Kai Zhu, et al.
Performance Evaluation Review
Parijat Dube, Michael Tsao, et al.
MASCOTS 2012
Danilo Ardagna, Raffaela Mirandola, et al.
QUASOSS - ESEC-FSE 2009