Leveraging Large Language Models for the Auto-remediation of Microservice Applications - An Experimental Study

Komal Sarda; Zakeya Namrud; Martin Litoiu; Larisa Shwartz; Ian Watts

ESEC/FSE 2024

Conference paper

15 Jul 2024

Leveraging Large Language Models for the Auto-remediation of Microservice Applications - An Experimental Study

Abstract

Runtime auto-remediation is crucial for ensuring the reliability and efficiency of distributed systems, especially within complex microservice-based applications. However, the complexity of modern microservice deployments often surpasses the capabilities of traditional manual remediation and existing autonomic computing methods. Our proposed solution harnesses large language models (LLMs) to generate and execute Ansible playbooks automatically to address issues within these complex environments. Ansible playbooks, a widely adopted markup language for IT task automation, facilitate critical actions such as addressing network failures, resource constraints, configuration errors, and application bugs prevalent in managing microservices. We fine-tune pre-trained LLMs using our custom-made Ansible-based remediation dataset, equipping these models to comprehend diverse remediation tasks within microservice environments. Once in-context tuned, these LLMs efficiently generate precise Ansible scripts tailored to specific issues encountered, surpassing current state-of-the-art techniques with high functional correctness (95.45%) and average correctness (98.86%).

Workshop paper