Resiliency quantification for large scale systems: An IaaS cloud use case
Abstract
We quantify the resiliency of large scale systems upon changes encountered beyond the normal system behavior. General steps for resiliency quantification are shown and resiliency metrics are defined to quantify the effects of changes. The proposed approach is illustrated through an Infrastructureas-a-Service (IaaS) Cloud use case. Specifically, we assess the impact of changes in demand and available capacity on the Cloud resiliency using interacting state-space based submodels. Since resiliency quantification involves understanding the transient behavior of the system, fixed-point variables evolve with time leading to non-homogenous Markov chains. In this paper, we present an algorithm for resiliency analysis when dealing with such non-homogenous sub-models. A comparison is shown with our past research, where we quantified the resiliency of IaaS Cloud performance using a one level monolithic model. Numerical results show that the approach proposed in this paper can scale for a real sized Cloud without significantly compromising the accuracy.