Carnegie Mellon University
Browse
file.pdf (596.12 kB)

Architecture-Based Run-Time Fault Diagnosis

Download (596.12 kB)
journal contribution
posted on 2011-09-01, 00:00 authored by Paulo Casanova, Bradley Schmerl, David Garlan, Rui Abreu

An important step in achieving robustness to run-time faults is the ability to detect and repair problems when they arise in a running system. Effective fault detection and repair could be greatly enhanced by run-time fault diagnosis and localization, since it would allow the repair mechanisms to focus adaptation effort on the parts most in need of attention. In this paper we describe an approach to run-time fault diagnosis that combines architectural models with spectrum-based reasoning for multiple fault localization. Spectrum-based reasoning is a lightweight technique that takes a form of trace abstraction and produces a list (ordered by probability) of likely fault candidates. We show how this technique can be combined with architectural models to support run-time diagnosis that can (a) scale to modern distributed software systems; (b) accommodate the use of black-box components and proprietary infrastructure for which one has neither a specification nor source code; and (c) handle inherent uncertainty about the probable cause of a problem even in the face of transient faults and faults that arise only when certain combinations of system components interact.

History

Publisher Statement

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-23798-0_29

Date

2011-09-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC