Publication
CNSM 2011
Conference paper

Semi-automated data center hotspot diagnosis

Abstract

An increasingly important requirement for energy-efficient data center operation is to diagnose and fix thermal anomalies that sometimes occur due to excessive workload or equipment failures. Today, the task of diagnosing thermal anomalies entails expert but tedious analysis of data collected manually from disparate management systems. Our ultimate goal is to substantially reduce the time, tedium and expertise required to diagnose thermal hotspots by developing a system that generates accurate diagnoses automatically. We describe a substantial step towards this goal: a loosely-coupled, semi-automated thermal diagnosis system that integrates IT and facilities data, uses simple heuristics to highlight the most likely culprits, and provides a graphical interface that enables an administrator to narrow the list further by exploring data correlations. Among the challenges addressed by our solution are coping with heterogeneous data types and data access methods, and detecting and managing erroneous sensor readings. © 2011 IFIP.

Date

Publication

CNSM 2011