Skip to content

Fault Modeling

Axel Habermaier edited this page Jun 14, 2016 · 1 revision

Faults are a core concept of S#. While verification techniques for functional correctness only consider the intended behavior of a system, safety analysis techniques, by contrast, also consider the degraded behavior in the presence of one or more faults. Consequently, models used for safety analysis have to contain both kinds of behaviors; in particular, the model has to contain the system's relevant faults as well as their effects on the system. Usually, creating a model of the intended behavior already poses a challenge, while the degraded behavior in case of faults is even harder to predict and model correctly.

S# helps by making it possible to model the local effects that a fault has on the affected component(s) only; S# is later able to determine the global impact of the faults on the system's overall behavior. Consequently, it is only necessary to consider local fault effects, which is easier to get right than predicting the global consequences of a fault, helping to increase the adequacy of the models. It is always possible and usually desirable to specify the intended behavior first and add the degraded behavior later in a separate step. In that sense, a S# model including the faults is an extension of the S# model without any faults. In S#, faults are modeled by adding certain fields to components and classes modeling fault effects; the actual fault injection (and also fault removal for analysis efficiency reasons) are done automatically in the background.

Fault Terminology

The fault terminology used by S# matches the terms defined by Avizienis et al. The following gives a brief overview of the relevant terms for working with S#.

For safety analysis purposes, situations in which faults cause some behavior that would not have occurred otherwise are the most interesting ones; in fact, faults are only of importance in such situations. These situations represent fault activations; that is, a fault is activated when it has some influence on the system by somehow affecting its behavior. Faults are dormant until they are activated and become active; they become dormant again when they are deactivated.

Fault Activation

In contrast to other [safety analysis approaches and tools](Related Work and Tools), S# does not focus on fault activation states that indicate whether a fault is dormant or active. During modeling, property specification, model checking, or simulations, S# only considers fault activations. Fault activations are implicit in a S# model; S# therefore determines automatically whether a fault can be activated, i.e., whether the fault has any effect on the modeled system in a certain situation. The reasons for using this somewhat non-traditional approach are twofold:

  1. We feel that fault activations are the actual concept that is of interest for safety analyses; fault activation states are only of secondary interest to model persistence constraints (see below).

  2. Fault activations allow for significantly more efficient model checking; we've seen model checking time reductions by almost four orders of magnitude compared to the traditional state-based fault modeling approach that S# previously used.

Fault Propagations

In S#, fault propagations are not explicitly modeled; they are deduced automatically by the model checker. Only faults and their local effects have to be explicitly modeled in S# in order to be analyzed. Intra- and inter-component error propagation is implicit in S# to simplify modeling; the propagation chain is what S# is designed to uncover automatically. Only the dependencies between components are modeled through implicit or explicit port bindings.

Faults and Fault Effects

A S# Fault can represent any kind of fault, in particular (and most commonly) external hardware faults caused by nature. Additionally, one can also model development faults in that way, representing a mistake that was feared to be made during development, such as software bugs. In any case, the reason why such a fault exists or might be activated is irrelevant for S# and the safety analyses. For instance, a development fault might be introduced by a malicious and intentional backdoor or by an overtired developer; a hardware fault might be because of radiation, a flood wave, etc.

S# faults are dormant until they are activated, typically nondeterministically, to simulate a natural cause that results in the fault or to simulate that a development fault is triggered somehow, i.e., a line of code that contains a bug is executed and the bug is exposed. Fault activation results in fault effects that change the internal state or behavior of the affected components, thereby causing an error.

Errors

Errors are explicitly modeled, instead they represent a deviation of a component's internal state from what it should have been; therefore, error exist only at runtime. Such errors can propagate through the component, causing other errors. In S#, fault effects model the effect of a fault on the behavior or state of a component, indirectly describing a set of errors. A fault effect is caused by the activation of a fault, and induces or causes one or more errors to occur.

Failures and Failure Modes

Eventually, an error can result in a failure, where the error manifests itself in a way that is externally observable. The way a component fails is called a failure mode; for instance, is full and is empty are two failure modes of the [pressure sensor](Pressure Tank Case Study) that might result from various different errors and faults. Such a failure can either be a system hazard, or it might cause a fault in another component. Through implicit or explicit port bindings, a failure of a component implicitly causes an input fault in another component, where the sequence of error propagation continues.

Zero-Step Propagations

The chain of fault activations causing errors that are propagated to failures can not always be found like this in a model. For instance, consider the [pressure sensor](Pressure Tank Case Study): To model the failure mode is full, we add a fault with an effect that immediately results in a failure (shown below), without causing any other errors first. In that case, the error caused by the fault immediately results in a failure, and the fault and its effect are often named in accordance with the failure mode the fault and its effect are intended to model.

Fault Persistence

A fault's persistence constrains the transitions between its active and dormant states. Transient faults, for instance, are activated and deactivated completely nondeterministically, whereas permanent faults, while also activated nondeterministically, never become dormant again. S#'s standard library includes the types TransientFault and PermanentFault that model these two kinds of fault persistence, which are the most common ones.

In general, transient faults subsume all other kinds of persistence, but exclusively using transient faults might make a system seem less safe than it actually is. For instance, consider a hardware component that can completely break, i.e., it might no longer function at all. Obviously, such a component cannot magically repair itself, so it is a permanent fault. If the fault is modeled to be transient, the model of course includes all situations in which the fault behaves like a permanent one, but also many more. If a hazard occurs in such an additional situation, the fault appears to be critical, even though this situation is physically impossible in the real world. Persistence constraints are therefore similar to general fairness constraints, where modeled behavior is excluded from consideration by the model checker for reasons of adequacy.

Fault Modeling in S#

Being designed for safety analyses, S# of course has first class support for modeling faults and fault effects. It guarantees certain formal properties such as conservative extension in order to ensure adequacy of the safety analysis results that it computes. The following examples model the [pressure sensor's](Pressure Tank Case Study) two faults is full and is empty.

Adding Faults to a Model

To add a fault to a model, a field of type Fault has to be added to some component. If the fault affects a single component only, it is usually advisable to add the fault to the affected component itself, as is done in the following for the PressureSensor component:

public class PressureSensor : Component
{
	  public readonly Fault SuppressIsEmpty = new TransientFault();
	  public readonly Fault SuppressIsFull = new TransientFault();

    // ...
}

We usually make the fault fields both public and readonly: That way, faults can be referred to in formulas, but they cannot be reinitialized from outside the component. The fields are initialized with new instances of type TransientFault; consequently, the two faults are activated and deactivated completely nondeterministically. Faults can be initialized with any object of a type derived from Fault; TransientFault and PermanentFault are standard classes provided by S#. Custom Fault-derived types are also possible, see below.

A fault can affect multiple components in order to model common cause failures, for instance. In that case, it is typically advisable to add the fault to a parent component of all affected components and pass the reference to the fault to the affect components' constructors:

public class ChildComponent : Component
{
    public ChildComponent(Fault fault)
    {
        // ...
    }
}

public class ParentComponent : Component
{
    public readonly Fault F = new TransientFault();

    private ChildComponent _child1;
    private ChildComponent _child2;

    public ParentComponent()
    {
        _child1 = new ChildComponent(F);
        _child2 = new ChildComponent(F);
    }
}

Modeling Fault Effects

Activated faults affect the internal state of a component or the behavior of one or more of its ports. Fault effects are commonly modeled by adding a nested classes derived from the affected component. Fault effects must always be marked with the [FaultEffect] attribute. The local effects are modeled by overriding provided ports, required ports, or the affected component's Update method:

public class PressureSensor : Component
{
    // Faults
    public readonly Fault SuppressIsEmpty = new TransientFault();
    public readonly Fault SuppressIsFull = new TransientFault();

    // Provided ports; must be virtual in order to be affected by faults
    public virtual bool IsFull => PhysicalPressure >= Specification.SensorPressure;
    public virtual bool IsEmpty => PhysicalPressure <= 0;

    // Required port
    public extern int PhysicalPressure { get; }

    // Fault effect 1
    [FaultEffect(Fault = nameof(SuppressIsFull))]
    public class SuppressIsFullEffect : PressureSensor
    {
    	 public override bool IsFull => false;
    }

    // Fault effect 2
    [FaultEffect(Fault = nameof(SuppressIsEmpty))]
    public class SuppressIsEmptyEffect : PressureSensor
    {
    	 public override bool IsEmpty => false;
    }
}

The nested classes that model the effects of the is full and is empty faults are called SuppressIsFullEffect and SuppressIsEmptyEffect; both derive from PressureSensor. The IsFull and IsEmpty provided ports are overridden by the effects. In order for that to work, the affected ports must be declared virtual and the overriding ports must be declared override. Fault effects therefore make use of C#'s regular virtual dispatch mechanism.

When PressureSensor.IsFull is invoked by an implicit or explicit port binding, the original behavior is only executed as long as SuppressIsFull is not activated. That is, when S# activates the fault SuppressIsFull during model checking, SuppressIsFullEffect.IsFull is executed instead of PressureSensor.IsFull, always returning false regardless of the actual physical pressure level that is usually checked by the port. Consequently, the fault's effect causes a failure of the sensor in that it no longer reports that the pressure tank is full.

Advanced Fault Effect Modeling

While in the example above, only provided ports are affected by faults, it is indeed also possible to override required ports or the component's Update method. Additionally, the above example only shows how fault activations immediately result in component failures. It is of course also possible, especially when overriding Update, that fault activations result in errors that are not immediate failures: The overridden component members could change some values of the component's fields, resulting in an error. If and when such an error propagates to a failure is determined by S# automatically. Here's an example:

public class C : Component
{
    // The component's internal state
    private int _x;

    // Required port; must be virtual in order to be affected by faults
    public virtual extern int M();

    // Update method
    public override void Update()
    {
        _x = 7;
    }

    [FaultEffect(Fault = /* ... */)]
    public class E : C
    {
        public override int M() => base.M() + 1;

        public override void Update()
        {
            _x = 0;
        }
    }
}

The fault effect C.E affects both C's Update method and its required port M. Whenever C.Update is invoked and C.E's corresponding fault (not shown) is activated, E.Update is executed instead, setting _x to 0 instead of 7, causing an error that is not immediately externally observable, hence is not a failure. When C.M is called and the fault is activated, E.M is executed instead. It in turn invokes C.M using C#'s base method invocation syntax base.M; the base call computes the actual value through the required port's binding. The computed value is then incremented by 1, resulting in either a failure or an error depending on how M's returned value is actually used (not shown in the example).

Fault effects for a component C can also be modeled by adding non-nested classes derived from C to the model. In that case, however, the fault effect has no access to C's non-public members, as per the usual C# accessibility rules.

Associating Faults and Fault Effects Statically or Dynamically

There is a 1:n correspondence between faults and fault effects, that is, a Fault instance can have multiple effect instances, but each effect instance belongs to exactly one Fault instance. There are two ways to associate faults and fault effects: Statically at compile-time and dynamically at runtime. Statically, a fault effect is associated with a fault by name matching; it is the preferred approach for faults that affect a single component only. The association is specified by referencing a fault in the [FaultEffect] attribute. For example:

public class PressureSensor : Component
{
    public readonly Fault SuppressIsFull = new TransientFault();

    [FaultEffect(Fault = nameof(SuppressIsFull))]
    public class SuppressIsFullEffect : PressureSensor
    {
    	 // ...
    }

    // ...
}

If the association between a fault and its effects cannot be statically determined at compile-time, there is the Fault.AddEffect<T> method that can be called dynamically when the model is initialized. This is especially useful when modeling common cause faults. Revisiting the example from above where multiple component instances are affected by the same fault instance, the effects can be hooked up as follows:

public class ChildComponent : Component
{
    public ChildComponent(Fault fault)
    {
        var effect = fault.AddEffect<E>(this);
        effect.X = 17;
    }

    [FaultEffect]
    public class E : ChildComponent
    {
        public int X;

        // ...
    }
}

public class ParentComponent : Component
{
    // see above
}

The ChildComponent's constructor adds the fault effect ChildComponent.E to the fault provided as a constructor argument using the Fault.AddEffect<T> method. The method returns an instance of ChildComponent.E representing the added fault effect, which can be used to initialize the effect as shown above.

Explicitly adding a fault effect to a fault dynamically always overrides static associations. So in the example above, if the FaultEffect attribute specified a fault, it would be overwritten by the invocation of Fault.AddEffect<ChildComponent.E>.

Effect Priorities

There can be multiple fault effects affecting the same component port. In that case, however, the S# compiler by default issues a warning because the fault effect that actually has an effect is chosen nondeterministically, which is often unintended. Assuming the PressureSensor to have an additional fault that causes it to report the tank to be full even though it is not (the dual of is full), the S# model would look like this:

public class PressureSensor : Component
{
    // ...
    public virtual bool IsFull => PhysicalPressure >= Specification.SensorPressure;

    [FaultEffect(/* ... */)]
    public class SuppressIsFullEffect : PressureSensor
    {
    	 public override bool IsFull => false;
    }

    [FaultEffect(/* ... */)]
    public class ForceIsFullEffect : PressureSensor
    {
    	 public override bool IsFull => true;
    }
}

The warning generated by the S# compiler reads as follows:

Warning SS4005: 'PressureTank.PressureSensor.IsFull.get' is affected by multiple
fault effects without an explicit priority specification, resulting in possibly
unintended additional nondeterminism. Consider making priorities explicit by
adding the 'SafetySharp.Modeling.PriorityAttribute' to:
'PressureTank.PressureSensor.ForceIsFullEffect',
'PressureTank.PressureSensor.SuppressIsFullEffect'.

If the additional nondeterminism is indeed intentional, mark both fault effects with the Priority attribute, using the same priority level; for instance, [Priority(0)]. The warning is silenced while still allowing S# to choose between the two effects when both faults are activated at the same time.

Otherwise, assign different priorities to the effects: Marking PressureSensor.SuppressIsFullEffect with [Priority(0)] and PressureSensor.ForceIsFullEffect with [Priority(1)] means that in situations in which both faults are activated, PressureSensor.ForceIsFullEffect takes precedence while PressureSensor.SuppressIsFullEffect is ignored. That is, fault effects with higher priorities take precedence over effects with lower priorities.

Custom Persistence Kinds

Transient and permanent faults are the two most common kinds of fault persistence. S# therefore provides the TransientFault type to be used for faults that can be activated and deactivated completely nondeterministically. For faults that, once activated nondeterministically, cannot be deactivated again, PermanentFault is provided. For other kinds of fault persistence, a new class deriving from Fault must be added to the model.

The following MaintenanceFault class represents a fault that is similar to a PermanentFault, except that the fault can be repaired at some point, allowing it to be deactivated again. Such a fault can be implemented as follows:

public class MaintenanceFault : Fault
{
    private bool _isActive;

    public MaintenanceFault()
        : base(requiresActivationNotification: true)
    {
    }

    protected override Activation CheckActivation()
    {
        // Warning: No side effects are allowed in this method!
        return _isActive ? Activation.Forced : Activation.Nondeterministic;
    }

    public override void OnActivated()
    {
        _isActive = true;
    }

    public void Repair()
    {
        _isActive = false;
    }
}

The class is derived from Fault and contains a field _isActive indicating whether the fault is currently active. In the constructor, we pass true to the base class constructor's requiresActivationNotification parameter to indicate that S# must call the overridden OnActivated method.

The method CheckActivation is invoked by S# to determine whether the fault can, must, or cannot be activated. As long as _isActive is false, that is, the fault is dormant, we return Activation.Nondeterministic, causing S# to consider both a potential activation as well as no activation. Once the fault is active, however, _isActive is true and the method returns Activation.Forced, meaning that if an activation is possible, S# must activate the fault; the case that the fault is not activated is no longer considered. Once S# has determined that the fault can indeed be activated, it invokes the OnActivated method. This is where _isActive is set to true.

Important: Do not set _isActive to true in CheckActivation! In fact, that method must be completely side effect-free, otherwise the fault will always be considered to be activatable.

The Repair method can be called to make the fault dormant again by setting _isActive to false. With that, MaintenanceFault could be used as follows:

class C : Component
{
    public readonly MaintenanceFault F = new MaintenanceFault();

    public override void Update()
    {
        if (/* some condition indicating that F is repaired */)
            F.Repair();
    }

    [FaultEffect(Fault = nameof(F))]
    public class E : C
    {
        // ...
    }
}