Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - The customizable fault/error model for dependable distributed systems
AU - Walter, C.J.
AU - Suri, Neeraj
PY - 2003/1/2
Y1 - 2003/1/2
N2 - Dependability is a qualitative term referring to a system's ability to meet its service requirements in the presence of faults. The types and number of faults covered by a system play a primary role in determining the level of dependability which that system can potentially provide. Given the variety and multiplicity of fault types, to simplify the design process, the system algorithm design often focuses on specific fault types, resulting in either over-optimistic (all fault permanent) or over-pessimistic (all faults malicious) dependable system designs. A more practical and realistic approach is to recognize that faults of varied severity levels and of differing occurrence probabilities may appear as combinations rather than the assumed single fault type occurrences. The ability to allow the user to select/customize a particular combination of fault types of varied severity characterizes the proposed customizable fault/error model (CFEM). The CFEM organizes diverse fault categories into a cohesive framework by classifying faults according to the effect they have on the required system services rather than by targeting the source of the fault condition. In this paper, we develop (a) the complete framework for the CFEM fault classification, (b) the voting functions applicable under the CFEM, and (c) the fundamental distributed services of consensus and convergence under the CFEM on which dependable distributed functionality can be supported. © 2002 Elsevier Science B.V. All rights reserved.
AB - Dependability is a qualitative term referring to a system's ability to meet its service requirements in the presence of faults. The types and number of faults covered by a system play a primary role in determining the level of dependability which that system can potentially provide. Given the variety and multiplicity of fault types, to simplify the design process, the system algorithm design often focuses on specific fault types, resulting in either over-optimistic (all fault permanent) or over-pessimistic (all faults malicious) dependable system designs. A more practical and realistic approach is to recognize that faults of varied severity levels and of differing occurrence probabilities may appear as combinations rather than the assumed single fault type occurrences. The ability to allow the user to select/customize a particular combination of fault types of varied severity characterizes the proposed customizable fault/error model (CFEM). The CFEM organizes diverse fault categories into a cohesive framework by classifying faults according to the effect they have on the required system services rather than by targeting the source of the fault condition. In this paper, we develop (a) the complete framework for the CFEM fault classification, (b) the voting functions applicable under the CFEM, and (c) the fundamental distributed services of consensus and convergence under the CFEM on which dependable distributed functionality can be supported. © 2002 Elsevier Science B.V. All rights reserved.
KW - Dependability
KW - Distributed systems
KW - Error classification
KW - Fault modeling
KW - Computer system recovery
KW - Error analysis
KW - Fault tolerant computer systems
KW - Probability
KW - Systems analysis
KW - Theorem proving
KW - Dependable distributed systems
KW - Distributed computer systems
U2 - 10.1016/S0304-3975(01)00203-1
DO - 10.1016/S0304-3975(01)00203-1
M3 - Journal article
VL - 290
SP - 1223
EP - 1251
JO - Theoretical Computer Science
JF - Theoretical Computer Science
SN - 0304-3975
IS - 2
ER -