Expected annual fraction of data loss as a metric for data storage reliability
Abstract
Several redundancy and recovery schemes have been developed to enhance the reliability of storage systems. The effectiveness of these schemes has predominately been evaluated based on the mean time to data loss (MTTDL) metric, which has been proven useful for assessing tradeoffs, for comparing schemes, and for estimating the effect of the various parameters on system reliability. In the context of distributed and cloud storage systems, for economical reasons, it is of great importance to also consider the magnitude along with the frequency of data loss. We focus on the following reliability metric: the expected annual fraction of data loss (EAFDL), that is, the fraction of stored data that is expected to be lost by the system annually. We present a general methodology to obtain the EAFDL metric analytically, in conjunction with the MTTDL metric, for various redundancy schemes and for a large class of failure time distributions that also includes real-world distributions like Weibull and gamma. As a demonstration, we subsequently apply this methodology to derive these metrics analytically and to assess the reliability of a replication-based storage system under clustered, declustered, and symmetric data placement schemes. We show that the declustered placement scheme offers superior reliability in terms of both metrics. Previous work has used simulation to evaluate the magnitude of data loss, but this is the first work to analytically assess it, and the first to present a general theoretical framework for this context.