Neural network-based task scheduling with preemptive fan control
Abstract
As cooling cost is a significant portion of the total operating cost of supercomputers, improving the efficiency of the cooling mechanisms can significantly reduce the cost. Two sources of cooling inefficiency in existing computing systems are discussed in this paper: temperature variations, and reactive fan speed control. To address these problems, we propose a learning-based approach using a neural network model to accurately predict core temperatures, a preemptive fan control mechanism, and a thermal-aware load balancing algorithm that uses the temperature prediction model. We demonstrate that temperature variations among cores can be reduced from 9°C to 2°C, and that peak fan power can be reduced by 61%. These savings are realized with minimal performance degradation.