Modeling of Correlated Failures and Community Error Recovery in Multiversion Software
Abstract
In this paper we consider three aspects in modeling of multiversion software. First, we propose the Beta-Binomial distribution to model correlated failures in multiversion software. Second, we present a combinatorial model to predict the reliability of a multiversion software configuration. This model can take as inputs failure distributions either from measurements or from a selected distribution (e.g., Beta-Binomial). Various recovery methods can be incorporated in this model. Third, we investigate the effectiveness of the Community Error Recovery method based on checkpointing as suggested in [13]. This method appears to be effective only when the failure behavior of program versions are lightly correlated. We also consider two different types of checkpoint failures: an omission failure where the correct output is recognized at a checkpoint but the checkpoint fails to correct the wrong outputs, and a destructive failure where the good versions get corrupted at a checkpoint. The former just reduces the effectiveness of the checkpoints while the latter has a catastrophic effect on the reliability. © 1990 IEEE