Near-misses and failure: Part 1
13 November 2018
EFigure 2 Space shuttle Columbia: detached foam impacting left wing
Drawing on two high-profile examples, Sean Brady suggests that attitudes toward near-misses can sometimes result in complacency – with catastrophic consequences.
An article in the ‘Harvard Business Review’, entitled How To Avoid Catastrophe(1), explored how understanding the causes of ‘near-misses’, can play a key part in anticipating the more significant failures.
The article, by Tinsley, Dillon and Madsen, examined failures from a diverse range of industries, including BP’s Deepwater Horizon oil blowout in the Gulf of Mexico and Apple’s issues with poor signal strength following the release of the iPhone 4 in 2010.
While failures of a structural engineering nature were not specifically discussed, it is interesting to examine if near-miss concepts can be applied by the structural engineering profession to better anticipate catastrophic failures.
The fundamental characteristics of near-misses are presented in this article, and, in Part 2, their potential application to structural engineering is explored.
Near-misses can be defined as successful outcomes where chance plays a critical role in averting failure(2). In other words, they are situations where latent issues exist, that have the potential to cause significant failures when subject to certain enabling conditions, but where good luck has, to date, prevented such enabling conditions from occurring.
Such a concept is, of course, well known to the workplace health and safety profession, who typically require both workplace incidents and near-misses be recorded.
Near-misses are simply viewed as accidents waiting to happen, with good luck having played a key role in preventing an incident to date.
The concept of near-misses is clearly illustrated by the issues faced by Toyota in 2009, when a Lexus sedan, which was manufactured by Toyota, experienced uncontrolled acceleration issues and crashed, leading to fatalities.
A subsequent investigation discovered latent issues with Toyota’s acceleration system, which resulted in the withdrawal of six million vehicles and a loss in sales of $2 billion to Toyota in North America alone(1).
Was there an opportunity for Toyota to identify this latent issue prior to the fatal crash? Figure 1 shows the percentage of complaints relating to acceleration issues for a specific vehicle(1).
Firstly, for the Honda Accord, the percentage of complaints relating to acceleration issues from 1995 to 2010 was typically less than 10 per cent of the overall complaints received for the vehicle. (Apparently this is a typical percentage, and upon investigation is found to generally relate to driver error, rather than a vehicle defect).
However, for the Toyota Camry, the percentage of complaints relating to acceleration issues dramatically increased from less than 10 per cent in 2000 to almost 50 per cent in 2009.
Given that Toyota introduced a new acceleration system in 2001, this deviation from a typical percentage of complaints relating to acceleration should have suggested there was a significant issue with the new accelerator system.
Indeed, Tinsley et al argue that each of the complaints was a near-miss, as each complaint was an incident where uncontrolled acceleration occurred (the latent issue), but a catastrophic crash did not result (due to an absence of enabling conditions). Those near-misses continued to accumulate until luck ran out in 2009, and the tragic crash occurred.
Why were these warning signs ignored? Fundamentally, Tinsley et al conclude that people are hardwired to misinterpret or ignore such warning signs, resulting in them not being investigated. Two issues particularly come into play that cloud judgment: normalisation of deviance and the outcome bias.
Both of these cognitive biases are illustrated by NASA’s Columbia disaster in 2003, when the space shuttle Columbia disintegrated upon re-entry, with the loss of the shuttle and all seven crew(3).
The technical cause of the disaster was a failure of the shuttle’s Thermal Protection System (TPS), which allowed hot plasma to enter the shuttle’s wing structure, causing it to structurally fail, with subsequent loss of the shuttle.
The investigation found that the TPS had been damaged during lift-off, when a briefcase-sized piece of foam became detached from the main fuel tank and impacted the shuttle’s left wing (main image).
As with Toyota, there was near-miss information available prior to the disaster. The TPS design was based on the assumption that it would not be subject to debris impact.
Debris impact had potentially disastrous consequences, namely loss of the shuttle during re-entry. However, it was clear from video footage that debris was impacting the TPS during previous flights in the shuttle programme(3).
Furthermore, based on shuttle inspections following mission completion, it was confirmed that damage was indeed occurring to TPSs as a result of these impacts(3). Therefore, in near-miss terms, there was a ‘deviation’ between expected performance (no impacts) and actual performance (impacts occurring).
Each launch was a near-miss, with good luck intervening to prevent a large enough piece of foam from impacting the shuttle’s TPS at a critical location. Why then, given the potential catastrophic consequences of such a deviation, was it ignored?
Normalisation of deviance
Near-miss research shows that when human beings become familiar and comfortable with a risk (or deviation) it becomes normalised, that is, what was once a concern becomes acceptable(1).
In NASA’s case, the engineers were aware that the TPS debris impacts were a deviation, and this caused initial concern. However, with every successful launch and return, despite the debris impacts, engineers and NASA management became more comfortable with the impacts, and they became normalised(3).
In effect, rather than being treated as evidence that the potential for catastrophic failure existed, the near-misses were viewed as supporting the position that catastrophic failure was unlikely. Once deviations are normalised, the opportunity to learn from them is generally lost.
While there is obvious wisdom in investigating and understanding the causes of failures (although it doesn’t always occur in practice), research indicates that the causes of successes are rarely investigated(1).
Near-misses, unfortunately, appear to be typically treated as successes, rather than failures, and remain un-investigated. For example, NASA categorised shuttle launches with debris impacts as successful missions, despite their potential for catastrophic failure.
The debris impacts effectively came to be viewed as an ordinary occurrence, becoming a maintenance issue rather than a flight safety issue(3). Ultimately, NASA observed successful outcomes, and assumed that the process that led to them was fundamentally sound, even when it wasn’t(2).
The research indicates that the ability of normalisation of deviance and the outcome bias to cloud judgment should not be underestimated. Tinsley et al stress that across many industries, including NASA, telecommunications companies and automobile manufacturers, multiple near-misses preceded all of the failures they examined, and in most cases, these near-misses were ignored or misread.
Indeed, they were often viewed as proof that the system was working, that is, despite the potential for failure, failure did not occur, thus confirming the robustness of the system(1,2).
As a result, the concept of a ‘deviation’ becomes critical, as it provides a means of rationally challenging these cognitive biases. Returning to Figure 1, the Toyota Camry’s increase in percentage of acceleration complaints can be thought of as a ‘deviation’. A key aspect of a deviation is that it is a fact.
While arguments can ensue about what constitutes a ‘small’ failure (and psychology indicates that human nature tries to hide or disguise small failures), a deviation is simply a measurement of the difference between expected and actual performance, and once expected performance is defined and actual behaviour is recorded, a deviation becomes apparent.
For example, a paper by Cannon and Edmondson(4) describes an approach taken by Electricite De France, a nuclear power plant operator, to identify and investigate potential near-misses events: “The organisation tracks each plant for anything even slightly out of the ordinary and has a policy of quickly investigating and publicly reporting any anomalies throughout the entire system so that the whole system can learn.”
In Part 2, the application of near-miss concepts in structural engineering will be explored, the process for identifying and investigating near-misses will be presented, and the dark side of possessing near-miss information will be highlighted.
In the meantime, the following quote by Edward Rogers, chief knowledge officer from NASA’s Goddard Space Flight Center, illustrates how powerful near-miss concepts can be in anticipating failure, even in an organisation such as NASA, which faces major challenges, complexity, and uncertainty: “Almost every mishap at NASA can be traced to some series of small signals that went unnoticed at the critical moment(1).”
1) Tinsley CH, Dillon RL and Madsen PM (2011) ‘How to Avoid Catastrophe’, Harvard Business Review, 89 (4), pp. 90-97
2) Dillon RL and Tinsley CH (2008) ‘How NearMisses Influence Decision Making under Risk: A Missed Opportunity for Learning’ Management Science, 54 (8), pp. 1425-1440
3) Hall JL (2003) ‘Columbia and Challenger: Organizational failure at NASA’, Space Policy, 19 (4), pp. 239-247
4) Cannon MD and Edmondson AC (2005) ‘Failing to Learn and Learning to Fail (Intelligently): How Great Organizations Put Failure to Work to Innovate and Improve’, Long Range Planning, 38 (3), pp. 299-319
This article was first published in August 2013 in ‘The Structural Engineer’, pp 34–35. www.thestructuralengineer.org
Author: Sean Brady is the managing director of Brady Heywood (www.bradyheywood.com.au), based in Brisbane, Australia. The firm provides forensic and investigative structural engineering services and specialises in
determining the cause of engineering failure and non-performance.