Rafae Bhatti, Elisa Bertino, et al.
Communications of the ACM
Parallel and clustered architectures are increasingly being used as a foundation for high-capacity servers. At the same time, the availability expectations are also rising rapidly, since the effects of down time become more apparent and have higher economic consequences for larger systems. The use of parallel structures generally implies more hardware and software components. The presence of more and larger components increases the chances that an individual component will fail, and that failure has the potential to hurt the overall availability of the system. This paper discusses the use of "restart techniques" as an important strategy in providing increased availability in a parallel structure. The paper covers a seT of functions that have been developed for the S/390® Parallel Sysplex™.
Rafae Bhatti, Elisa Bertino, et al.
Communications of the ACM
Michael C. McCord, Violetta Cavalli-Sforza
ACL 2007
Raghu Krishnapuram, Krishna Kummamuru
IFSA 2003
A. Gupta, R. Gross, et al.
SPIE Advances in Semiconductors and Superconductors 1990