Cloud Native Sustainable LLM Inference in Action

Chen Wang; Eun Kyung Lee; Bo Wen; Huamin Chen; Cathy Zhang

KubeCon EU 2024

Tutorial

19 Mar 2024

Cloud Native Sustainable LLM Inference in Action

Abstract

Join our tutorial on sustainable Large Language Models (LLM) inference using cloud-native tech. We'll cover LLMs, energy use, and Kepler's role in monitoring power during LLM workloads. Learn about balancing environmental sustainability and tech efficiency, using AI accelerator frequency adjustments in Cloud Native tech for optimized LLM inference. This ensures power efficiency and cost-effectiveness.

Experience a live demo of vLLM, an advanced inference framework, in action. See how we tweak AI accelerator settings in a Kubernetes cluster for ideal power-computation balance.

This tutorial is a must-attend for professionals keen on integrating environmental sustainability with cloud-native technology solutions. Whether you're a developer, an IT specialist, or a sustainability advocate, you'll gain valuable insights into the future of eco-friendly cloud computing. Join us to be at the forefront of this significant technological evolution.

Workshop paper