Safety-II vs. HRO in Socio-Technical Systems: An Overview Framed via Resilience

Dr. Rune Storesund, D.Eng., P.E., G.E., F.ASCE, D.FE
Executive Director, UC Berkeley Center for Catastrophic Risk Management

Prof. Emeritus Karlene Roberts, Ph.D.
Director, UC Berkeley Center for Catastrophic Risk Management

Introduction

This article provides a high-level overview of Safety-II and High Reliability Organizations (HROs).  The two concepts are outlined and a discussion on the similarities and differences between Safety-II and HRO is presented.  Both frameworks are targeted to more resilient Socio-Technical Systems, where human and organizational factors are acknowledged as important attributes that can enhance and/or degrade operations and outcomes, as well as the ability to proactively and interactively mitigate undesired outcomes before ‘failure.’  Attributes of a Socio-Technical System are introduced as there are fundamental factors unique to these types systems relative to Resilience and HROs.

Socio-Technical Systems

The concept of a Socio-Technical System originated following WWII[1],[2] and recognizes the inseparable relationship between human and organizational factors and their physical system components.  The working hypothesis was that improved operational system performance would be realized by leveraging the knowledge and capabilities of workers to confront technological uncertainty, variation, and adaptation.  This perspective is very important when examining the performance of ‘systems’ that are operated and managed by ‘people.’  There are key fundamental operational principles, first articulated by A. Cherns[3], associated with the Socio-Technical System framework, which include:

  • Principle 1 – Compatibility:  The operationalized STS must be compatible with its objectives and intended outcomes.  If a system objective includes the capability to self-modify, adapting to change and making the most use of the creative capacities of the individual, then processes must exist to capture and propagate inputs from system agents (i.e. the ‘people’ of the system).  Positive attributes that promote and facilitate system self-modification, adapting to change, and leveraging the creating capacities of the individual include: (a) leadership commitment to safety values and actions; (b) presence of a respectful work environment; (c) an environment for raising concerns; (d) inquiring attitudes; (e) work process/workflow evaluations; and (f) continuous improvement/process improvement.
  • Principle 2 – Minimal Critical Specification:  This principle has two aspects, negative and positive.  The negative simply states that no more should be specified than is absolutely essential; the positive requires that we identify what is essential.
  • Principle 3 – The Socio-Technical Criterion:  This principle states that variances, if they cannot be eliminated, must be controlled as near to their point of origin as possible. We need here to define variance, a word much used in socio-technical literature. Variance is any unprogrammed event; a key variance is one which critically affects outcome.
  • Principle 4 – The Multifunctionality Principle–Organism vs. Mechanism:  The traditional form of organization relies very heavily on the redundancy of parts. It requires people to perform highly specialized, fractionated tasks. There is often a rapid turnover of such people, but they are comparatively easily replaced. Each is treated as a replaceable part. Simple mechanisms are constructed on the same principle. Disadvantages arise when a range of responses is required, that is, when a large repertoire of performances is required from the mechanism or the organization. This usually occurs if the environmental demands vary.  It then becomes more adaptive and less wasteful for each element to possess more than one function.
  • Principle 5 – Boundary Location:  In any organization, departmental boundaries have to be drawn somewhere and are usually drawn so as to group people and activities on the basis of one or more of three criteria: technology, territory and time. The principle has certain corollaries. A very important one concerns the management of the boundaries between department and department, between department and the organization as a whole and between the organization and the outside world. The more the control of activities within the department becomes the responsibility of the members, the more the role of the supervisor/foreman/manager is concentrated on the boundary activities.
  • Principle 6 – Information Flow:  This principle states that information systems should be designed to provide information in the first place to the point where action on the basis of it will be needed.  Information systems are not typically so designed. Properly directed, sophisticated information systems can, however, supply a work team with exactly the right type and amount of feedback to enable them to learn to control the variances which occur within the scope of their spheres of responsibility and competence and to anticipate events which are likely to have a bearing on their performance.
  • Principle 7 – Support Congruence:  This principle states that the systems of social support should be designed so as to reinforce the behaviors that the organization structure is designed to elicit. If, for example, the organization is designed on the basis of group or team operation with team responsibility, a payment system incorporating individual members would be incongruent with these objectives. Not only payment systems, but systems of selection, training, conflict resolution, work measurement, performance assessment, timekeeping, leave allocation, promotion and separation can all reinforce or contradict the behaviors which are desired.
  • Principle 8 – Human Values:  This principle states that an objective of organizational design should be to provide a high quality of work. We recognize that quality is a subjective phenomenon and that everyone wants to have responsibility, variety, involvement, growth, etc. The objective is to provide these for those who do want them without subjecting those who do not to the tyranny of peer control. In this regard, we are obliged to recognize that all desirable objectives may not be achievable simultaneously.
  • Principle 9 – Incompletion:  The system is never ‘completed.’  Continuous improvement is required on many of the system elements, including work processes, hazard identification and risk management.

Thus, for a comprehensive approach to Socio-Technical Systems, there is a portfolio of factors that must be addressed in order to positively impact outcomes.  Omitting consideration of all factors will result in partial solutions that typically result in overall failure.

Resilience

Resilience, as used in the context of an attribute of Socio-Technical Systems, refers to the ability of a system to “adjust its functioning prior to, during, or following changes and disturbances, and thereby sustain required operations under both expected and unexpected conditions[4].”  It should be noted that three important terms are identified in this definition that must be underscored.

Required operations:  This term identifies the core functional outcome(s) the system is required to generate.  If these outcomes are not generated, system ‘failure’ occurs.  We note that it is very common, however, for these core functional outcomes to be implicit rather than explicit, resulting in ambiguity across organizational divisions and personnel.  The lack of specificity with respect to the core functional outcomes/required operations results in confusion within a system’s organization and hinders the ability to clearly communicate the core required operations.

Expected conditions:  These are the anticipated operational parameters for the system and are typically delineated through a series of explicit and implicit system assumptions generated during the initial configuration of the system.

Unexpected conditions:  These are operational parameters that may impact a system that have not been considered or evaluated to date.  Unexpected conditions also include parameters that were ‘unimaginable’ as well as remote scenarios that may have been originally considered, but discounted due to a perception of very low likelihood of occurrence.

Woods and Hollnagel[5] describe three time-dependent qualities of resilience within a Socio-Technical System and its exposure to the surrounding environment, which is subject to dynamic developments over time.  First, anticipation – knowing what might be expected to occur.  Second, attention – knowing what to look for.  Third, response – knowing what to do and having the required resources to fully implement the response.

Image result for resilience required qualities
Figure 1:  Resilience qualities as described by Woods and Hollnagel[5].

Safety – II

Notions of Resilience, as manifested through approaches such as Resilience Engineering, Safety Differently, and Safety II, are concepts introduced in the last several decades.  These approaches focus on proactive risk management measures and more fully account for human and organizational factors with respect to enhancement and reduction of undesirable system outcomes.

Hollnagel[6] presents the concept of Safety II where emphasis is centered on ‘humans’ as a resource necessary for system flexibility and resilience.  The working hypothesis is that Socio-Technical Systems are complex, which results in work situations (workflows, procedures, etc.) being underspecified, thus actual work conditions will differ from what has been specified and/or described.  Performance variability (the need for modifications/adjustments) by the workforce is thus not only normal and necessary, but required.  Hollnagel states[6]:

Because most socio-technical systems are intractable, work conditions will nearly always differ from what has been specified or prescribed.  This means that little, if anything, can be done unless work – tasks and tools – are adjusted so that they correspond to the situation.  Performance variability is not only normal and necessary, but also indispensable.  The adjustments are made by people individually and collectively, as well as by the organization itself.  Everyone, from bottom to top, must adjust what they do to meet existing conditions (resources and requirements).  Because the resources of work (time, information, materials, equipment, the presence and availability of other people) are finite, such adjustments will always be approximate rather than perfect.

Hollnagel presents Safety I as the ‘reactive’ approach to system performance, with focus on ‘root-cause’ analysis and malfunctions from human participation.  It negates the positive contributions humans make to eliminating/reducing magnitudes of hazards and consequences (see Table).  Hollnagel posits that Safety I and Safety II, collectively, constitute Socio-Technical System resilience (figure). The Safety II approach empowers the human and organizational components of the Socio-Technical Systems to spearhead the “Anticipation,” “Attention,” and “Response,” aspects of the Resilience qualities, as well as the “updating” and “learning” shown in Figure 1.

Table 1:  A comparison of Safety I and Safety II

Figure 2:  Resilience a combination of Safety I and Safety II (Hollnagel, 2014).

High Reliability Organizations (HROs)

Research on high reliability organizations began in the mid-1980s and examined complex organizations that seemed to operatewithout failure in complex environments[7],[8]. The underlying supposition of this work was that existing organizational theory concepts do not explain the processes at work in such organizations very well. The work initially focused on three organizations considered relatively exotic by critics.  They were the U.S. Navy’s nuclear-powered aircraft carriers, the Federal Aviation Administration’s air traffic control system, and a commercial nuclear power plant. 

Weick and Sutcliffe[9] identified the major components of high reliability organizations:

  • Preoccupation with failure. Attention to close calls and near misses.
  • Reluctance to simplify interpretations.  Attention to root cause analysis.
  • Sensitivity to operations. Situational awareness and carefully designed management practices.
  • Commitment to resilience.  Constant attention to management practices that might need to be changed.
  • Deference to expertise. Listen to experts and follow their advice.

All organizations, HRO and non-HROs, develop beliefs about the world, including susceptibility to hazards resulting in undesired outcomes.  Organizations develop approaches to confront these hazards via norms, regulations, procedures, rules, guidelines, job descriptions, and training materials.  During the course of operation, organizations accumulate unnoticed events that are at odds with accepted beliefs about the hazards and resulting consequences.

Unlike non-HROs, HROs develop beliefs about the world, hazards, and potential outcomes with fewer simplifications, less finality, and more process-improvement.  The definition of what is ‘hazardous’ is continually revisited and updated.  HROs tend to accumulate and more rapidly detect unnoticed smaller events that are at odds with what they expect.  This gives them the ability to investigate and understand the anomalies and outline proactive responses to more rapidly restore reliable performance.

Comparison of Safety II and HRO

A close examination of Safety II and HRO actually reveals substantial overlap, where the focus is on the ‘every-day opportunities’ for people to anticipate and respond in a proactive fashion to minimize the magnitude (or even eliminate) undesirable outcomes.  The ability of people (whether individuals or groups) is recognized as an asset and resource for pro-active risk mitigation.

Extending these concepts to the notion of Resilience by Hollnagel and Woods5 (Figure 3), we can map, as a function of time, Proactive as corresponding to the system’s anticipation and attention phases; Interactive measures are taken during the attention and response phases; and lastly, Reactive measures are taken during the response phase.  We note that there are frequently post-event actions, based on ‘lessons learned’, that can be imposed on a system based on outcomes of other similar system incidents.  These reactive changes can happen years or decades after the original incident.

Mapping Safety I and Safety II on the Resilience Framework shows that Safety II corresponds primarily to the Anticipation and Attention phases.  Safety I is almost exclusively focused on the Response phase or, more commonly, on the “After the Fact Imposed Lessons.”  Safety II, however, leverages contributions by the people in the system to identify and mitigate the potential occurrence of undesirable system outcomes.  Safety II, thus, is an essential part of a system’s ability to be resilient and minimize the occurrence of undesired outcomes.

Lastly, HROs concentrate their resources (people) towards the Anticipation, Attention, and early Response phases as a result of the continual updating processes.  HROs do undergo reactive “Learning,” but because the substantial investment in the proactive approaches, rarely experience the magnitude of undesired outcomes as non-HROs and are able to proactively incorporate “After the Fact Imposed Lessons” learned by other systems before having to actually respond to an unfolding event.

Figure 3:  Scope of attention to System Resilience from a Safety II and HRO standpoint.

Extending Safety II and HRO to the earlier Socio-Technical System attributes, we find that the same techniques are employed through Safety II and HRO; these are summarized in Table 2.  Virtually identical approaches can be mapped to Safety II and HRO.

Table 2:  Summary of Safety II and HRO approaches to address attributes of Socio-Technical Systems

Attribute Safety II HRO
Compatibility Satisfied by updating and refining objectives and intended outcomes. Satisfied by updating and refining objectives and intended outcomes.
Minimal Critical Specification Satisfied via process improvement and updating capabilities from workforce by eliminating inefficient processes and augmenting/enhancing where appropriate. Satisfied via process improvement and updating capabilities from workforce by eliminating inefficient processes and augmenting/enhancing where appropriate.
Socio-Technical Criterion Satisfied by empowering workforce to identify and mitigate variances at origination points. Satisfied by empowering workforce to identify and mitigate variances at origination points.
Multifunctional Principle – Organism vs Mechanism Satisfied by embracing cross-training and adaptability. Satisfied by embracing cross-training and adaptability.
Boundary Location Satisfied by encouraging and promoting variance-tracing across organizational boundaries. Satisfied by encouraging and promoting variance-tracing across organizational boundaries.
Information Flow Satisfied by promoting information exchange and dialogue across organizational boundaries and hierarchical levels. Satisfied by promoting information exchange and dialogue across organizational boundaries and hierarchical levels.
Support Congruence Satisfied by providing compensation schemes that promote and encourage systems-based outcomes rather than individual-focus. Satisfied by providing compensation schemes that promote and encourage systems-based outcomes rather than individual-focus.
Human Values Satisfied by allowing some flexibility and individualism in how work is accomplished, but ensuring consistency with intended outcomes. Satisfied by allowing some flexibility and individualism in how work is accomplished, but ensuring consistency with intended outcomes.
Incompletion Satisfied via process-improvement programs, training, incident reviews, and mindfulness. Satisfied via process-improvement programs, training, incident reviews, and mindfulness.

Conclusion

Safety II approach empowers the human and organizational components of the Socio-Technical Systems to spearhead the “Anticipation,” “Attention,” and “Response,” aspects of the Resilience qualities, as well as the “updating” and “learning.”  The emphasis is placed on proactive activities that empower risk-reduction activities by system workers, rather than post-incident reactive system changes.  Realization of undesirable outcomes is typically the result of potentially flawed procedures rather than

HROs are groups of individuals who monitor deviation in anticipated outcomes with actual outcomes by minimizing inappropriate simplifications, reduced finality, and regular process-improvement.  The definition of what is ‘hazardous’ is continually revisited and updated.  HROs tend to accumulate and more rapidly detect unnoticed smaller events that are at odds with what they expect.  This gives them the ability to investigate and understand the anomalies and outline proactive responses to more rapidly restore reliable performance.

Both frameworks are targeted to more reliable and robust Socio-Technical Systems, where human and organizational factors are acknowledged as factors that can enhance and/or degrade system operations and outcomes, as well as the ability to proactively and interactively mitigate undesired outcomes before ‘failure.’

About the Center for Catastrophic Risk Management (CCRM)

The Center for Catastrophic Risk Management (CCRM) is a group of academic researchers and practitioners who recognize the need for interdisciplinary solutions to avoid and mitigate tragic events.  This group of internationally recognized experts in the fields of engineering, social science, medicine, public health, public policy, and law was formed following the tragic consequences of Hurricane Katrina to formulate ways for researchers and experts to share their lifesaving knowledge and experience with industry and government.  CCRM’s international membership provides experience across cultures and industries that demonstrate widespread susceptibility to pervasive threats and the inadequacy of popular, checklist-based remedies that are unlikely to serve in the face of truly challenging problems. 

About the Authors

Dr. Rune Storesund, P.E., G.E., Executive Director, Center for Catastrophic Risk Management (CCRM)

Dr. Storesund is the Executive Director of UC Berkeley’s Center for Catastrophic Risk Management (risk.berkeley.edu).  His research focuses on safe and reliable critical infrastructures.  He earned a Bachelors of Arts from UC Santa Cruz, a Bachelors of Science in Civil Engineering from UC Berkeley, a Masters Degree in Civil Engineering at UC Berkeley, and a Doctorate of Engineering in Civil Systems at UC Berkeley.  He is a Fellow at the American Society of Civil Engineers and a Senior Member and Certified Diplomat in the National Academy of Forensic Engineers.  He is also the CEO and Founding Member of Storesund Consulting (a nice civil engineering and forensics firm); Storesund Construction Services (a civil works construction company employing High Reliability Organization (HRO) techniques and strategies); and NextGen Mapping, Inc. (a software start-up company focused on leveraging big data associated with infrastructure systems to improve decision-making and connect decision-makers with real time (or near-real time) business intelligence models to enable informed and educated decisions).

Prof. Karlene Roberts, Director, Center for Catastrophic Risk Management (CCRM)

Karlene H. Roberts is a Professor at the Walter A. Haas School of Business, at the University of California at Berkeley. She is also Chair of the Center for Catastrophic Risk Management at Berkeley. Roberts earned her bachelor’s degree in Psychology from Stanford University and her Ph.D. in Industrial Psychology from the University of California at Berkeley. She also received the docteur honoris causa from the Universite Paul Cezanne (Aix Marseilles III). Since 1984 Roberts has investigated the design and management of organizations and systems of organizations in which error can result in catastrophic consequences. She has studied both organizations that failed and those that succeed in this category. Some of the industries Roberts has worked in are the military, commercial marine transportation, healthcare, railroads, petroleum production, commercial aviation, banking, and community emergency services.  Roberts has consulted in the areas of human resource management, staffing policies, organizational design, and the development of cultures of reliability. Recently she has consulted with the military, in the healthcare industry, in software development, and in the financial industry. She testified before the Columbia Accident Investigation Board. Roberts is a Fellow in the American Psychological Association, the Academy of Management, and the American Psychological Society. She has contributed to policy formation for the Federal Aviation Administration, the U.S. Coast Guard, the U.S. Navy, the Department of Energy, and the Mineral Management Service of the U.S. Department of Interior.


[1] Eric Trist & K. Bamforth (1951). Some social and psychological consequences of the longwall method of coal getting, in: Human Relations, 4, pp.3-38. p.20-21.

[2] Emery, F. (1959). Characteristics of sociotechnical systems, Document #527, London: Tavistock Institute.

[3] Cherns, A. (1976). The Principles of Sociotechnical Design. Human Relations, 29(8), 783–792. https://doi.org/10.1177/001872677602900806

[4] Hollnagel, Erik.  Safety -I and Safety-II, The Past and Future of Safety Management. 

[5] Hollnagel, E., Woods, D. D. & Leveson, N. C. (Eds.) (2006). Resilience engineering: Concepts and precepts. Aldershot, UK: Ashgate.

[6] Hollnagel, E. (2014). Safety-I and Safety-II: The Past and Future of Safety Management. Farnham, UK: Ashgate.

[7] Rochlin, G. LaPorte, T., and Roberts, K. (1987) The self-designing high reliability organization. Naval War College Review, 40, 76-90.

[8] Roberts, K.H. (1990) “Managing High Reliability Organizations.”  California Management Review, 32, 101-113.

[9] Weick, K.E. and Sutcliffe, K. (2001) Managing the Unexpected. San Francisco: Jossey Bass.