Yesterday I listened to the Red Teams Podcast where Yuri and Mark Mateski talked about the lessons they learned from examining how Safety investigations are done and how they can relate to red teaming.
This got me thinking about the truth of this, having done both for some years now, and it speaks to the philosophy that “Once you know the way, you see it in all things” So let’s get after it.
Why do we do red teaming? To make our systems more resilient, and reduce the amount of risk we expose ourselves to.
Why do we do Safety investigations? To reduce the amount of risk inherent in our systems, preventing asset loss and injury.
So, both these questions have fairly similar answers… Probably reasonable to think both fields have some major commonalties. First we’ll break down how a safety investigation is conducted. According to the Occupational Safety and Health Administration (OSHA) there are four steps to conducting an investigation. I would argue that there are actually five:
1. Create the Team
2. Preserve and Document the Scene
3. Collect Information
4. Determine Root Causes
5. Implement Corrective Actions.
Creating the team.
This really is the most critical of all the steps. If your team does not’ function well together and is not’ trained appropriately your results will not be as good as they can be. Most Safety investigation teams are made up of one to three people, with specialists on retainer. Usually a manager, investigator, and recorder make up the team. This is depends of course on the size and complexity of the investigation. Large or complex investigations will require more investigators or special technical support. In all cases, the team should be from outside the organization that had the mishap. This reduces the amount of inherent bias and undue “command influence”.
Preserve and Document the Scene
This step is critical in collecting perishable information that can give vital insight into the cause of a mishap. Some organizations will train a small group of employees in evidence collection procedures so that the scene is not contaminated or compromised while the actual investigation team in en-route to the scene called the interim safety board. Things like witness statements, fluid samples, toxicology tests, and photographs of the scene are all important things to collect immediately. Depending on the complexity of the mishap, the Interim safety board, may need to coordinate for evidence storage or facilities to conduct interviews. A critical point in this step is to have the actual investigation manager’s communication to the interim team so that all parties have a clear understanding of what is happening.
Once the actual Investigation team is on site, the investigation can begin in earnest. The team should review all of the information collected by the interim team, sometimes there is no need for further collection. Often though the information collected by the interim team will create new questions as to why the mishap occurred. This step may have to be re-visited based on questions raised in the next step. At this step in the investigation the team will still be required to hold on to all the evidence, this can cause some friction between the team and the management of the organization.
Determine Root Causes.
This is where the investigation team earns their paycheck. OSHA has free on line training on root cause analysis. There are many heuristics out there to get you to the root of what caused the mishap. The important thing to remember however is that, usually, it’s not one thing, but multiple things that lead up to a mishap. Some people call this the “Swiss Cheese” model, where all the holes have to align just right in order for the accident to happen. Another common analogy is that of a chain. If one of the links of the chain are broken then the chain, or mishap can’t occur. This may be an iterative process and can lead in directions that were not apparent at the start. It’s important that the Manager keeps the team focused and on target at this stage, as it’s easy to waste time going down rabbit holes at this point.
Implement Corrective Actions.
This step is seemingly the most intuitive right. Fix the problem. Reduce the risk. If everything has gone well up to this point, the team should have a good grasp on the factors that lead up to the mishap. Now they just have to work with the experts and practitioners to develop a way to keep this from happening again. Usually these corrective actions fall into four broad strategies, Avoidance, Reduction, Sharing, and Retention. A hierarchy of controls usually implements the strategies. These are, from most effective to least: Elimination, Substitution, Engineering controls, administrative controls, and finally PPE.
This process should not look very different from the stages of red teaming
1. Assemble the team
3. Make a plan
5. Present findings
This is just a short introduction to what could be a deep dive into the comparison between Red Teaming and Safety.