Elimination based fault localization in shared resource environments
Abstract
Fault Localization is the process to identify the component(s) that is the exact source of failure given a set of observed failure indications. Despite being a focus of research for a long time, fault localization is still deemed to be a challenge due to the complexity of current distributed environment. Growing adoption of cloud computing wherein multiple applications share multiple resources increases the complexity of the problem. Existing probing techniques are found to be inefficient due to large number of applications and resources. Availability and utilization of such shared resource environment trigger the need for finding other novel techniques to fault localization. In this paper, we present an elimination-based fault localization method that leverages shared resources among applications. Shared resources are used as Readily Available Probes to find the real-time state of applications. These probes are used to eliminate non-faulty resources leaving minimal subset of resources that are likely to the faulty components. We show this method significantly reduces the effort required to design and implement probes. Various experiments demonstrate that our method reduces time taken and increases efficiency and accuracy of problem determination. © 2011 IEEE.