John O’Sullivan outlines how to assign a safety integrity level to a system, identifying and analysing risks associated with a system design and implementing the appropriate safety instrumented system to mitigate those risks


Author: John O’Sullivan, engineering director, Douglas Control and Automation

The term ‘safety integrity level’ (SIL) is used as a convenient shorthand to describe the safety rating of various hardware components and systems, e.g. “This PLC CPU is rated SIL3.” The SIL was designed as a shorthand to represent the results of complex analysis, but it is still only a part of an overall lifecycle approach to functional safety.

Technically, the safety integrity level is the level by which the risk is reduced by the introduction of a safety instrumented system (SIS). There are four levels, with SIL1 being the least reduction in risk and SIL4 being the greatest.

The SIS is separate to and independent of the basic process control system (BPCS) and, like the BPCS, it consists of sensor(s), logic solver(s) and final element(s). The SIS reduces the risk by intervening during a failure of the BPCS to ensure that the system remains safe. While the SIS hardware and software components may resemble the BPCS components and may come from the same manufacturer, they are required to be more reliable.

The specification, design and operation (safety life cycle, or SLC) are defined in the standard IEC 61508, ‘Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems’. This standard has spawned a number of industry- and sector-specific standards that delve into more detail for specific industries, although we will focus on IEC 61508 in this article. IEC 61508 defines the SLC in three sections:

  • Phases 1 to 5: Analysis;
  • Phases 6 to 13: Realisation; and
  • Phases 14 to 16: Operation.

The following standards elaborate on the approach to SIL assignment outlined in IEC 61508:

  • IEC 61511 ’Functional Safety – Safety Instrumented Systems for the Process Industry Sector’
  • IEC 61513 ‘Nuclear Power Plants – Instrumentation and Control Important to Safety’
  • IEC 50128 ‘Railway Applications – Communication, Signalling and Processing Systems – Software for Railway Control and Protection Systems’
  • IEC 50129 ‘Railway Applications – Communication, Signalling and Processing Systems – Safety-Related Electronic Systems for Signalling’.

Safety integrity levels – hazard and risk analysis

During the analysis phases of a project, hazard identification and risk analysis are carried out by an interdisciplinary team. This should consist of all the system stakeholders including designers, process owners, safety, automation, mechanical and electrical specialists. Where possible, hazards are designed out of the system. Where this is not possible, e.g. a volatile raw material is essential to the process, the risks associated with the hazard are identified.

Hazards are considered occurrences of harm and once identified the risk is assessed as the product of ‘frequency of the occurrence’ and the ‘severity of the harm’. Methods of analysis include:

  • HAZOP: Hazard and Operability Study
  • FME(C)A: Failure Mode Effect (and Criticality) Analysis
  • FMEDA: Failure Mode Effect and Diagnostic Analysis
  • ETA: Event Tree Analysis
  • FTA: Fault Tree Analysis

Normally, a risk matrix uses the likelihood of the occurrence and the consequence of the event to categorise the risks. Risks that cannot be designed out and are not tolerable will require safety functions to reduce the risk to a tolerable level. This results in the ‘residual risk’, which must be less than the pre-defined ‘tolerable risk’. The greater the reduction required to reach the residual risk, the higher the SIL. See the diagram below where the consequences, frequency/exposure and probability of avoidance are used to determine the required SIL.

Figure 1: Risk assessment

Risk Parameters:
C1: Minor injury or damage
C2: Serious injury or one death, temporary serious damage
C3: Several deaths, long-term damage
C4: Many dead, catastrophic effects

Frequency/Exposure Time:
F1: Rare to quite often
F2: Frequent to continuous

Possibility of Avoidance:
P1: Avoidance possible
P2: Unavoidable, scarcely possible

Probability of Occurrence:
W1: Very low, rarely
W2: Low
W3: High, frequent

Safety Integrity Levels Required:
-: Tolerable risk, no safety requirements
a: No special safety requirements
b: A single E/E/PE is not sufficient
1: SIL 1
2: SIL 2
3: SIL 3
4: SIL 4

Depending on the SIL level to be achieved based on the risk reduction required, a device must achieve a low enough probability of failure and a high enough safe-failure fraction.

Probability of failure

Probability of failure comes in two flavours: probability of failure on demand (PFD) for safety functions that are only activated when required, and probability of failure per hour (PFH) for safety functions that are operating continuously. The lower the probability of failure, the higher the risk-reduction factor. The higher the risk-reduction factor, the higher the SIL achieved. (See the tables below for the figures related to PFD and PFH.)

1 0.1-0.01 10−1 – 10−2 10-100
2 0.01-0.001 10−2 – 10−3 100-1000
3 0.001-0.0001 10−3 – 10−4 1000-10,000
4 0.0001-0.00001 10−4 – 10−5 10,000-100,000

Table 1: Probability of failure on demand

1 0.00001-0.000001 10−5 – 10−6 100,000-1,000,000
2 0.000001-0.0000001 10−6 – 10−7 1,000,000-10,000,000
3 0.0000001-0.00000001 10−7 – 10−8 10,000,000-100,000,000
4 0.00000001-0.000000001 10−8 – 10−9 100,000,000-1,000,000,000

Table 2: Probability of failure per hour

Safe failure fraction

While the PFD and PFH tell us how likely a failure is to occur, the safe failure fraction (SFF) tells us what fraction of failures will be safe or if dangerous, detected. This is achieved by increased diagnostics and reporting of the safety function. The Greek letter λ is used to define the rate of failure per hour.

  • λsafe = failure rate leading to safe state
  • λdangerous = failure rate leading to dangerous state
  • λtotal = λdangerous + λsafe

This results in four types of failure rate, depending on whether the failure is detected or undetected. λdu is the rate of dangerous undetected failures. Thus, SSF = 1- λdu / λtotal. So, for SSF to be as high as possible, failures have to be safe or detected. If all the failure were safe and/or detected, the SFF would be 1 or 100%.

Before SSF can be used to determine the SIL, other factors have to be considered. First is the hardware fault tolerance (HFT) of the device. Achieved through redundancy, a HFT of N means that N+1 faults are required before the safety function is lost. Secondly, devices are treated differently for SSF depending on their type.

Type A devices are considered to be well defined and have sufficient failure data from experience in the field. Type B devices are considered to have insufficient data and field experience. See the tables below for the figures related to SSF.

SSF Hardware Fault Tolerance (HFT)
0 1 2
<60% SIL1 SIL2 SIL3
60% to 90% SIL2 SIL3 SIL4
90% to 99% SIL3 SIL4 SIL4
>99% SIL4 SIL4 SIL4

Table 3: SSF for Type A subsystem

SSF Hardware Fault Tolerance (HFT)
0 1 2
<60% Not allowed SIL1 SIL2
60% to 90% SIL1 SIL2 SIL3
90% to 99% SIL2 SIL3 SIL4
>99% SIL3 SIL4 SIL4

Table 4: SSF for Type B subsystem

In summary, the tools are available to identify and analyse risks associated with a system design and then implement the appropriate safety instrumented system to mitigate those risks and save lives and assets.

John O’Sullivan BE, Dip Phys Sci, CEng MIEI is the engineering director of Douglas Control and Automation. He has 20 years’ experience in the automation industry focusing on the pharmaceutical, biotechnology and medical device sectors. He has developed design and test specifications for the regulated environment and project manages automation and safety projects for life science customers. O’Sullivan has consulted on the validation of certified failsafe, high availability systems. O'RiordanChemsafety
  Author: John O’Sullivan, engineering director, Douglas Control and Automation The term ‘safety integrity level’ (SIL) is used as a convenient shorthand to describe the safety rating of various hardware components and systems, e.g. “This PLC CPU is rated SIL3.” The SIL was designed as a shorthand to represent the results...