Francesco Longo, Rahul Ghosh, et al.
DSN 2011
This paper describes a modeling framework for evaluating the impact of faults on the output of streaming applications. Our model is based on three abstractions: stream operators, stream connections, and tuples. By composing these abstractions within a Stochastic Activity Network, we allow the modeling of complete applications. We consider faults that lead to data loss and to silent data corruption (SDC). Our framework captures how faults originating in one operator propagate to other operators down the stream processing graph. We demonstrate the extensibility of our framework by evaluating three different fault tolerance techniques: checkpointing, partial graph replication, and full graph replication. We show that under crashes that lead to data loss, partial graph replication has a great advantage in maintaining the accuracy of the application output when compared to checkpointing. We also show that SDC can break the no data duplication guarantees of a full graph replication-based fault tolerance technique. © 2011 IEEE.
Francesco Longo, Rahul Ghosh, et al.
DSN 2011
Takayuki Osogami, Rudy Raymond
DSN 2011
Robert Soulé, Martin Hirzel, et al.
Software - Practice and Experience
Gabriela Jacques-Silva, Alberto Avritzer, et al.
ISSREW 2015