Replay, recovery, replication, and snapshots of nondeterministic concurrent programs
Abstract
The problem of replaying computations of non-deterministic concurrent programs arises in contexts such as debugging and recovery. We investigate the problem for an abstract model of concurrency, which generalizes dataflow networks, processors with shared variables, and logic programming models of concurrency. We say that nondeterminism is visible if the state is determined, up to some (appropriately defined) notion of equivalence, by the external behavior. We show that if nondeterminism is visible then replay is achievable using a. one-step lookahead sequential simulation algorithm. If the program has an additional monotonicity property called stability then recovery is possible without simulating the original computation, by restarting the program from a. certain easily constructed state. Also, for stable programs with visible nondeterminism, a process composed of identical parallel processes has the same external behavior as each of its components. Hence high crash-failure resilience is achievable by simple process replication. For such programs there is also an easy solution to the asynchronous snapshot problem. Stability holds for certain concurrent logic/constraint programming languages. We describe an efficient method for transforming a given stable concurrent logic/constraint program to an equivalent one with visible nondeterminism. The transformation has acceptable execution overhead, thus it could be employed in a practical realization of the proposed methods.