Exploring the effects of on-chip thermal variation on high-performance multicore architectures
Abstract
Inherent temperature variation among cores in a multicore architecture can be caused by a number of factors including process variation, cooling and packaging imperfections, and even placement of the chip in the module. Current dynamic thermal management techniques assume identical heating profiles for homogeneous multicore architectures. Our experimental results indicate that inherent thermal variation is very common in existing multicores. While most multicore chips accommodate multiple thermal sensors, the dynamic power/thermal management schemes are oblivious of the inherent heating tendencies. Hence, in the case of variation, the chip faces repetitive hotspots running on such cores. In this article, we propose a technique that leverages the on-chip sensor infrastructure as well as the capabilities of power/thermal management to effectively reduce the heating and minimize local hotspots. This technique can be used in existing multicore chips as long as the thermal sensor data can be made transparent to the power/thermal management at the software layer. According to our experimental analysis on test-chips, 5°C peak temperature reduction can be achieved with no performance degradation, hence the inherent energy efficiency of the chip can be improved without any performance or cost penalty. © 2011 ACM.