Columbus: Configuration discovery for clouds
Abstract
Low-cost, accurate and scalable software configuration discovery is the key to simplifying many cloud management tasks. However, the lack of standardization across software configuration techniques has prevented the development of a fully automated and application independent configuration discovery solution. In this work, we present Columbus, an application-agnostic system to automatically discover environmental configuration parameters or Points of Variability (PoV) in clustered applications with high accuracy. Columbus uses the insight that even though configuration mechanisms and files vary across different software, the PoVs are encoded using a few common patterns. It uses a novel rule framework to annotate file content with PoVs and a Bayesian network to estimate confidence for annotated PoVs. Our experiments confirm that Columbus can accurately discover configuration for a diverse set of enterprise and cloud applications. It has subsequently been integrated in three real-world systems that analyze this information for discovery of distributed application dependencies, enterprise IT migration and virtual application configuration.