- Kalyan Dasgupta
- Umamaheswari Devi
- et al.
- 2022
- CLOUD 2022
Sustainable Computing
Overview
Enterprises are heavily dependent on Information Technology for digitizing and automating their operations. Much of these enterprise IT workloads are either already deployed on the public cloud or private data centers or expected to migrate to a data center in the near future. The estimated electricity consumption of data centers is of the order of 200 terawatt-hours, which is approximately 1% of the global electricity consumption. Although the grim energy predictions of the past for data centers have not come to bear, with the increase in AI-powered workloads and other trends, ensuring that the energy consumption at data centers still remains contained requires continued and significant investments to identify and eliminate inefficiencies at its various elements and operations that consume and hog power.
Simultaneously, in an effort to tackle climate change, governments across the globe are mandating that enterprises report the carbon emissions from all their operations, including that from their computing workloads, both on-premise and on cloud, and act to reduce the same. To address this requirement, enterprises are turning to data center and cloud operators as well as third-party tools and service providers to assess their current emissions and evaluate optimization options.
At IBM Research, we address the above problem using an ambitious, comprehensive, and multi-pronged strategy that includes carbon quantification for tenants and workloads on IBM Cloud and on-premise data centers, AI-infused sustainability transformations for enterprise customers, and multi-disciplinary sustainable computing research spanning the areas of multi-cluster infrastructure, hardware systems, platform, software, and AIOps to improve manufacturing processes, design specialized hardware and cooling systems, build carbon-aware software solutions, and develop innovative run-time algorithms to manage systems and software to mitigate environmental cost over the complete lifecycle.
Our near-term effort spans the three major phases of quantification, analysis or assessment, and optimization/remediation of carbon emissions in data centers and cloud to be performed cyclically in consultation with application owners at appropriate time granularity.
The overall architecture is provided below:
Within the above overall framework, we explore problems in the areas of:
- Power modelling of compute servers based on diverse architectures (x8s, POWER, Z), GPUs, and storage systems.
- Full-stack carbon quantification for software components at various layers such as VMs, pods, and applications and aggregations.
- Carbon quantification with little or no power measurements.
- Carbon hotspot detection and what-if analysis.
- Carbon-aware resource optimization for application resource management (ARM) tools like Turbonomic.
- Carbon-aware multi-cluster and multi-cloud dispatching.
Through our efforts, we aspire to infuse needed observability through the entirety of computing hardware and software stack and use it to build carbon performance monitoring and management solutions akin to those in the application performance monitoring and management space.