Building reliable systems is one of the major challenges faced by software developers as society is becoming more dependent on software systems. The failure of any system can lead to a serious loss, for example serious injury or death in case of safety critical systems and significant financial loss in the case of business-critical systems. As a consequence, fault tolerance is considered as a solution to provide reliability, but the fault tolerance capability is associated with many challenges, such as the right development phase where it needs to be introduced, how it can be composed with the software, and the issues that arise from this composition such as complexity and potential undesirable feature interactions.
This thesis presents an orthogonal fault tolerance framework for the composition of design diversity fault tolerance mechanism with the base system. It further ensures the separation of concerns between the ‘base’ system and the fault tolerance mechanisms that are composed with the base system. The composition in this framework is based on operational semantics that describe the behaviour of the underlying components when composed with the fault tolerance mechanisms. A custom-built pre-processor is based on these composition rules, and is used to automatically compose the system component and the fault tolerance mechanisms. The very introduction of different fault tolerance mechanisms to the system may cause interactions with other fault tolerance features or with system components. Logic properties written in CTL and LTL are used in NuSMV to analyse undesirable interactions.
To illustrate its applicability, the framework has been applied to the Home Automation and Therac-25 software.