Causality-Driven Performance Monitoring and Scaling Automation for Managed Solutions

Nianjun Zhou; Ajay Mohindra

doi:10.1109/SCC.2015.70

SCC 2015

Conference paper

17 Aug 2015

Causality-Driven Performance Monitoring and Scaling Automation for Managed Solutions

View publication

Abstract

A key feature of Cloud computing is its agility and flexibility to support the scalability needs of business solutions. Currently, the agility is only limited to the scalability of the compute, memory and storage. To improve an application's agility, we need to monitor & measure solution level metrics and associate the performance of the metrics to the business agility needs of the solution by making real-time scalability or change decisions. In this paper, we illustrate a scaling decision mechanism utilizing the monitoring data from infrastructure, middleware, and business level metrics. We use these performance metrics as input to a causality analysis model to make architecture changes or scalability decisions. Mathematically, we define the causality as a graph to link the changes in the measured metric values to the action of the solution change. The causality analysis follows scalability principles as best practices. They are a) the principle of performance scalability b) principle of contribution margin for scalability, and c) principle of the least cost of SLA compliance. We define these scalability principles as the rules to ensure that the business stakeholder of the solution can maintain or improve their business quality or profit margins as the computing capability scales up or down. To implement those principles, we need to establish the linkages of the business metrics to the decision of changes. To make such linkage, we first utilize causality analysis to identify feasible scaling actions, and then associate those actions with the system, application, and business performance metrics. With the help of causality analysis, we implement a performance monitoring and scaling automation framework for managed solutions using an Open Source Monitoring system.

Review