A hidden semi-Markov model for web workload self-similarity
Abstract
Hidden semi-Markov models (HSMMs) have been well studied and successfully applied to many engineering and scientific problems. The advantage of using a HSMM is its efficient forward-backward algorithms for estimating model parameters to best account for an observed sequence. In this paper, we propose a HSMM for modeling Web workloads. We show that this model asymptotically characterizes second order self-similar workloads when some duration distributions of the hidden states are heavy-tailed. A recursive formula is developed for estimating the Hurst parameter of selfsimilarity. We validate our model and estimation methods with respect to two sets of empirical data (requests per second) collected from two different Web servers. We then use this model to generate self-similar workloads that exhibit the same statistical properties. These measurements show that we can use as few as 4 states together with a simple Poisson process and heavy-tailed Pareto holding time distributions to accurately model the Web workloads considered in this study.