A host-independent supervised machine learning approach to automated overload detection in virtual machine workloads
Abstract
This paper evaluates a mechanism for applying machine learning (ML) to identify over-constrained IaaS virtual machines (VMs). Herein, over-constrained VMs are defined as those who are not given sufficient system resources to meet their workload specific objective functions. To validate our approach, a variety of workload-specific benchmarks inspired by common Infrastructure-As-A-Service (IaaS) cloud workloads were used. Workloads were run while regularly sampling VM resource consumption features exposed by the hypervisor. Datasets were curated into nominal or over-constrained and used to train ML classifiers to determine VM over-constraint rules based on one-Time workload analysis. Rules learned on one host are transferred with the VM to other host environments to determine portability. Key contributions of this work include: demonstrating which VM resource consumption metrics (features) prove most relevant to learned decision trees in this context, and a technique required to generalize this approach across hosts while limiting required up front training expenditure to a single VM and host. Other contributions include a rigorous explanation of the differences in learned rulesets as a function of feature sampling rates, and an analysis of the differences in learned rulesets as a function of workload variation. Feature correlation matrices and their corresponding generated rule sets demonstrate individual features comprising rule sets tend to show low cross-correlation (below 0.4) while no individual feature shows high direct correlation with classification. Our system achieves workload-specific error percentages below 2.4% with a mean error across workloads of 1.43% (and strong false negative bias) for a variety of synthetic, representative, cloud workloads tested.