Publication
SRDS 1991
Conference paper
Optimistic failure recovery for very large networks
Abstract
Optimistic failure recovery mechanisms are proposed as a way to provide transparent fault tolerance to distributed applications and systems. The authors identify problems that may arise when these mechanisms are applied to vast networks including many processors and spanning large geographical areas and many administrative domains. They present a technique--recovery unit gateways--that can be used to address many of these issues with existing failure recovery algorithms. This method can be applied with minimal disruption to existing transparent recovery systems, as well as to build large optimistic recovery systems while minimizing the dependency tracking overhead.