CUFF: A Configurable Uncertainty-driven Forecasting Framework for Green AI Clusters
Abstract
AI applications are driving the need for large dedicated GPU clusters, which are highly energy- and carbon-intensive. To efficiently operate these clusters, operators leverage workload forecasts that inform resource allocation decisions to save energy without sacrificing performance. The traditional forecasting methods provide a single-point forecast and do not expose the uncertainty about their predictions, which can lead to an unexpected loss in performance. In this paper, we present an uncertainty-driven GPU demand forecasting framework that exposes the uncertainty in its predictions and provides a mechanism to configure the trade-off between energy savings and performance. We evaluate our approach using multiple GPU workload traces and demonstrate that the forecasting framework, called CUFF, outperforms state-of-the-art point predictions. CUFF predictor meets performance goals 83% of the time compared to 7.6% for the point predictions under high GPU demand. Furthermore, CUFF knob enables users to configure up to 98% performance target while providing 26% energy savings, comparable value to point forecasts that only ensure 68% performance target.