The Downside of a Fault Tolerant System

Maintaining high reliability or availability is a marked advantage for any system. A system that achieves the ability to avoid system downtime due to a single failure event, is essential in many applications. Yet, the fault tolerant capability comes at a price.

A system that achieves the ability to avoid system downtime due to a single failure event, is essential in many applications. Yet, the fault tolerant capability comes at a price.

Here is a short list and brief description of fault tolerant design disadvantages:

Masking or obscuring low-level failures

The nature of a fault tolerance design is to continue to operate normally even with a component failure.

Thus if the ability to detect a component failure relies on a loss of function or capability, it may be difficult to detect the failure. This sets the stage for a second component failure to cause a system downing event.

Being able to detect individual component failures permits the repair or replacement of faulty elements restoring the system to full fault tolerance capability.

Increase in testing challenges

Similar to the inability to detect some single point failures, the ability to test the functionality and parametric values of components is also limited by the nature of the fault tolerance design.

It may require additional test functionality designed into the system, further adding to the complexity of the system.

Increase in cost, weight, and complexity

Redundancy, error checking, and fault isolation designs, as examples, add components and logical elements to a system.

This increases the weight, due to the added components, board size, and power requirements. It also adds complexity by including parallel, and complex circuit and logic required to detect and ignore (functionally speaking) single point failures.

Add parts and complexity, additional cost.

Reduction in emphasis on improving component or subsystem reliability

The design team may not focus on improving the inherent reliability of elements of a fault-tolerant system. This tends to occur as the priority is on identifying single point failures and creating a design that is resilient enough to continue operation.

The focus in system availability and not necessarily on system reliability.

Increase in acceptance of inferior components

Similar to the loss of focus on inherent reliability, the team may accept the lower cost and inferior component despite the increasing frequency of component failures.

Again the focus on system availability and robustness even with component failures lose priority as the design demonstrates it’s ability to meet fault tolerant requirements.

Increase in support and maintenance expenses

The lack of focus on reliability and the increased use of inferior components causes an increase in component level failures. These failures then require replacements and repairs of the affected systems. This increases the cost of operation of the system.

Fault-tolerant design is for specific applications where the added cost, weight and complexity along with the other downsides to this approach are worth the expenses.

A good team will focus on both the system availability along with the cost of operation/maintenance and the inherent reliability of the individual elements.

Deciding What Should Have Fault Tolerance (article)

Fault Tolerance Basics (article)

Fault Tree Analysis 8 Step Process (article)