Patrick McKendry

Classified U.S intelligence documents were discovered this week after being leaked onto the internet more than a month ago.¹ The documents include recent assessments of the war in Ukraine and the related potential actions of various countries.² The breach has been coined the “Discord Leaks” for its unexpected choice of megaphone - the niche social media chat platform, Discord, a smaller social community with only 150 million active users worldwide.³ In many ways, the platform choice calls into question the motives of the leak, with some experts suggesting that the intention was less about widespread distribution of classified information and more focused on building reputation and credibility within the online group chat.⁴ Also supporting this hypothesis is the alleged source of the leak, a 21-year-old Massachusetts Air National Guardsman.

Given the confluence of factors, many are asking the expected questions of “How did a 21-year-old get access to top secret documents?” and “How do we fix the root cause problems that lead to this breach?” Realistically, there may not be a good solution that fixes this specific problem without influencing structure and process outside of its scope. It is understood that the airman had a top-secret classification because he worked in technology support and needed that access to support the computer systems. The military also highlighted that the airman’s clearance was not out of the ordinary for similar team members his age.² Analyzing the situation from afar, there is likely some change that can be made to create a more precise clearance system with a larger number of more specific levels. But, this would potentially require an overhaul of clearance definitions for the entire intelligence organization.

In truth, this breach is a culture problem for the organization more than a permissions definition issue. The root problem is one of organizational reliability and is something that we can all learn to improve. For many organizations, reliability and error-free work is not an option, but a requirement. Research studies these High Reliability Organizations across categories like nuclear aircraft carriers, FAA air traffic control, hospital emergency room operations, and more.⁵ Any High Reliability Organization - HRO, for short - is defined by having work where normal accidents could be expected due to significant risk factors and high complexity of delivery, yet these teams manage to avoid accidents nonetheless.⁶ While many organizations have some of the following characteristics, only HROs see all characteristics simultaneously:

Hypercomplexity - High variety of levels and systems
Tight coupling - High degree of reciprocally interdependent systems
Hierarchical Differentiation - Multiple levels with different regulations at each level
Many decision makers - Many redundant decision points
Extreme accountability - Sub-par performance met with severe consequences
High feedback frequency - Decision results and feedback nearly immediate
Compressed time - Rapid cycles, defined in minutes versus days or weeks
Concurrent critical outcomes - Interdependent systems must deliver work with precise timing alignment across groups

With such strenuous work requirements, HROs demonstrate a number of valuable behaviors that separate their operations from those of most teams. Here, we will look at the research of how HROs act to avoid errors and how we can apply that to our own teams. We will also look at the cases where we should not follow HRO principles and why. Finally, we will review whether U.S. Intelligence is an HRO and how cultural adjustments could prevent a future information breach.

‍

It is All About Mindset

Research on HROs began focused in high-risk industries and has spread over time to cover a wide variety of dynamic work settings, such as hospitals ad wildland firefighting. The overarching difference that separates groups making the jump to an unofficial HRO status is the mindfulness that they bring to their approach to work and error mitigation.⁷ That focused effort is a necessary cultural tenet because the nature of complex work demands constant reinvention to account for the dynamic, ever-changing environment. HROs constantly work to improve their operations, seeking perfection in what they do in order to eliminate errors altogether. To do this, these organizations demonstrate 5 key behaviors.⁷

‍

Preoccupation with Failure

Avoiding failure is an obsession across the organization for HROs. Operators to executives are constantly looking at problems both big and small to think about how their process may break in order to proactively adjust controls to prevent a problem before it is needed. The approach to failure is often much healthier in HROs as well, as discussion on failure is encouraged because it is a necessity to improve the process. “Near misses” are also destigmatized and seen as success because existing process had prevented error.⁸ For example, this could look like having multiple nurses and surgeons check with a patient to confirm which knee is supposed to be operated on and also marking the correct knee prior to surgery.

‍

Reluctance to Simplify

Broad, general explanations do not make the cut for HROs. Often, broad solutions miss the mark on solving specific problems and for these industries, specific problems can have devastating consequences. Contrary to Occam’s Razor or the Pareto Principle, these organizations continue to dig at problems even after the initial problem is solved. As questions lead to more questions, team members continue to dive deeper into problems. The root issue is not always the most expected and sometimes is derived from entirely different functional areas altogether. For example, this could look like discussing a lack of engineer team staffing only to determine that the underlying issue is that projects are not properly scoped and taking up team capacity to refine the requirements.

‍

Sensitivity to Operations

Every operational evolution is a chance to collect feedback and learn in an HRO. These organizations realize that outcomes achieved in doing their work are measurable, and they look at these measures consistently to determine opportunities for improvement. It is never assumed that the outcomes will deliver standard results if the standard process is followed. Leaders look for these measured variations to learn more about their processes. This might look like a wildland firefighting crew reviewing the time to get the fire under control and acreage burned compared against their operating decisions to see how they might have acted differently for a future fire.

‍

Commitment to Resilience

Despite their efforts, HROs recognize that errors will happen and do not let this throw them off course. A hallmark behavior of these teams is that they can recognize emerging issues outside of their standard processes and are not thrown off as they actively triage novel problems. Said differently, as problems get worse these teams get better. Further, HROs are learning organizations and grow through incidents rather than being hampered by them. Again looking at wildland firefighting as an example, it is common that winds will change and fires will jump firelines and break out into new areas. These crews know how to react and are ready to handle this constant change in their environment.

‍

Deference to Expertise

Experts are not always leaders and this fact is respected in HROs. Senior leadership realizes that those who are closest to the work often know best what the problems and potential solutions are for their work. Operators are encouraged to voice their concerns and their perspective is respected regardless of their seniority. For example, on a nuclear aircraft carrier, the ship’s captain should trust their mechanic to diagnose problems in their area and recommend the best resolution.

‍

You May Be Reliable, But You’re Not Highly Reliable

To set expectations at the outset, most organizations are not High Reliability Organizations. Plainly speaking, they do not need to be. If you work in digital advertising, for example, the consequences of under-performance are not immediately and meaningfully hazardous to life. Further, most organizations do not want to meet this qualification. Remember, most HROs are defined as having multiple redundant decision makers, a fact that would make most modern business teams shudder. Still, reliable teams and process delivery is a virtue worth building, even if we stop short of the gold standard. There are a number of common practices that can help build towards greater team reliability.

‍

Preoccupation with Failure -> Project Post-Mortems

In HROs, proactive identification of errors is a daily activity. For most teams, trying to solve every edge case before it happens is not a productive use of time. You are most likely to miss some in any case. However, a reactive focus on problems discovered during project delivery is healthy. Most teams can and should use a project post-mortem for this.

In a post-mortem, teams review a recent project and breakdown what went well, what could have gone better, and what they want to change for the next time. In addition to their value for consolidating feedback, post-mortems are great ways to build team communication and morale when they are done as a collaborative learning exercise. To make your post-mortem most effective it is best to meet shortly after the project ends and to send questions in advance so that everyone has time to prepare thoughts. You’ll also need to assign someone the role of moderator to help ensure the conversation stays on track. Finally, remember to send out meeting notes afterwards to solidify takeaways and next steps.⁹

If you really want to take your focus here to the next level, teams can also try a pre-mortem before a project starts. In my experience, the pre-mortem can be a useful tool to hone project requirements because it makes the work more concrete. The pre-mortem can also make projects more efficient by identifying blockers early and helping teams to pursue the biggest questions in the project first.

‍

Reluctance to Simplify -> Ask Why 5 Times

We can all agree with the HRO premise that the comfortable, easy solutions rarely the best. Still, teams often fall prey to this temptation and stop seeking out root causes too early. In some cases, we run into the problem of searching too deeply and land on problem statements that, in most organizations, are outside of our control. To strike this balance right, I recommend that teams develop the habit of The Five Whys.

The Five Whys is an iterative technique for asking increasingly deeper questions as part of a root-cause analysis of a situation. Originally defined at the Toyota Corporation, the idea is simple - ask the question “Why?” in response to a problem statement and the subsequent answers a total of five times.¹⁰ That’s it. That is the entire technique. The method receives both its praise and criticism from its simplicity. While it may not be a perfect system, it works for our use case because our objective is primarily to develop the habit of going a few layers deeper into our problem. In managers, the tool is also valuable because it is easy to teach and can be easily picked up and used independently by all members of the team.

‍

Sensitivity to Operations -> Make Work Visible

This is behavior that most organizations get wrong and never even realize their shortcomings. In HROs, the delivery of work is well measured and is consistently reviewed upon project completion to evaluate the delivery processes. In most teams, work is not even measured. For example, does your team have a set of regular activities it performs and some expectations of how long it should take to complete each activity? You probably do. Does your team measure how long it actually takes to do that work and track those actual time commitments over time? You probably don’t.

In Making Work Visible, Dominica DeGrandis dives into “time theft” in the software development lifecycle and provides specific tactical recommendations to improve team throughput by increasing the visual representation of work delivery. In my experience, the tactics are good and the principles are great. DeGrandis’s recommendation is to start simple by just tracking work with a “To Do”, “Doing”, and “Done” kanban-style board and to add greater complexity as it is useful to your specific team.¹¹

In my experience, the principles of Making Work Visible are also useful for troubleshooting your individual productivity. I often find that I am most productive when I plan the day ahead in 30 minute increments and then track how I actually spent that time for end-of-day evaluation. You can similarly apply this to your team’s work for individual projects. As an example, if you want to increase the efficiency in which your team delivers their weekly performance reports, work with them to set a time goal and then ask them to measure and track how long the task actually takes. This simple activity will speed the work up. It will also lead your team to analyze the activity itself and find process improvement to help them move faster. That is the key here - there are many ways to deliver this idea but the key is to increase mindfulness around work.

‍

Commitment to Resilience -> Be Optimistic

For most organizations and individuals, stress responses exist along a distribution. Some people react negatively and exhibit post-traumatic stress behaviors after the situation has passed. Most people react neutrally and are more or less unchanged once the experience has passed. This is known as resilience in positive psychology. Some people, and arguably most HROs, demonstrate something different altogether - post-traumatic stress growth.¹² This suggests that these organizations learn from adversity and become better as a result of it.

I’ve written much more about how to build resilience in teams. In short, the key to maintaining resilience in the face of challenges is to acknowledge the difficulty and to work with optimism on the factors that you can control. As managers, the way that we approach challenges will influence our broader team’s approach. In my experience, starting with the expectation that something will break or go wrong with every project is a simple way to get a large percentage of the impact from this behavior.

‍

Deference to Expertise -> Gemba Walks

HROs expect operators in teams to have a voice in decisions for their areas of expertise. I expect that most organizations think they do this too. However, how often is strategy set from within the senior leadership team before being shared with a trite “Obviously, we want feedback on this.” You have probably even been on the end of a strategic recommendation from a senior team member and been left thinking “That idea is really far off from our context.” In my experience, strategy that is separated from tactics is worthless and regularly fails to get traction. To have reliably effective processes, leaders need to be mindful to either bring tactical experts into strategic conversations at key moments or to put themselves closer to the tactics on a recurring basis.

One way for leaders to stay up to date on tactical expertise is with a Gemba walk. A Gemba - roughly translated from Japanese as “the real place” - walk is an observational learning tactic where leaders follow along with front-line operators to observe processes and ask questions from their tactical experts. This technique differs from the ineffective and generally hated “Management By Walking Around” because it has defined purpose and focuses on specific details and process steps.¹³

In my experience, this tactic can be hard to do well. Leaders want to help too much and start making on-the-spot recommendations rather than asking questions of their experts. Front-line tacticians can also feel uncomfortable or pressured by the leader’s observation because the concept was poorly explained or the leader has not yet built the needed relationship capital for the team to feel at ease with their presence. However, when done well, it can be a tremendous boost to morale for the team and a useful way to upskill a leader’s knowledge.

‍

One Key Theme

Across all of the behaviors of High Reliability Organizations, the overarching theme of mindfulness stands out and helps these teams keep a vigilant focus on preventing or resolving problems. For the rest of us, these behavioral tactics will not replicate that same level of mindful focus on problems. I will argue that we also do not need it either. However, for us, there is another theme that spans across the behaviors above and that - in my view - is also present at the core of HROs.

Psychological safety is consistently identified as a key to high-performing teams and we see it appear here yet again. Psychologically safe organizations consistently demonstrate a focus on learning, an acknowledgment of fallibility and that things will go wrong, and a focus on curiosity and asking questions.¹⁴ As such an important team characteristic, it is another topic that I have covered at greater length. If you do not yet have it, focus first on building psychological safety up within your team or organization. It will act as a leverage multiplier on the resiliency tactics you implement.

‍

Should Intelligence Be Resilient?

As we think again about the U.S. Intelligence document breach, we have to ask which of these lessons they can carry forward to increase staff reliability. Further, we can debate whether Intelligence needs to be an HRO at all. Before we go further, all of this should be caveated by the fact that I have no deep knowledge of the workings of U.S. Intelligence and am simply using this to illustrate the topics we have discussed.

‍

Lessons Learned

We know that U.S. Intelligence is a learning organization, evidenced by process changes in reaction to previous breaches - such as the prohibition of thumb drives after Edward Snowden used that technology to transfer data out of secure facilities.² Often errors arise from a few behavioral breakdowns. In this case, I imagine leadership has a healthy preoccupation with potential security breaches, regularly focuses on operation outcomes, and is more than capable of being resilient in difficult times.

The issue at play here seems to be the very rough match between role and security clearance. The airman who is allegedly responsible for the breach worked in technology support. According to officials because he worked in a space where those documents existed, he needed that level of clearance.² This suggests a failure in being reluctant to simplify. Configurations can be set in many ways to provide nuanced permissions and access. For example, the airman’s access could have potentially been set to access and view the documents but have prevented him from downloading or printing those documents without a superior’s acknowledgement. Even if the functionality does not exist within the current technological architecture, those updates could very likely be made with sufficient investment - another example of why organizations might favor the simpler causal explanation. A failure to show deference to expertise could also be at play here. High-ranking intelligence offers may lack the technical expertise to understand access controls and conclude that the only and best option is to match overall clearance level to document access that is needed for one’s job. IT Security experts within the org may have known and already vouched for the benefits of improving role permissions precision but have been kept out of the conversation at key decision points.

‍

Is U.S. Intelligence an HRO?

The short answer is probably not. If we review the characteristics of our High Reliability Organizations, we get a mixed bag of relevancy.

Hypercomplexity - Yes - A very hierarchical and multi-faceted organization. The National Intelligence community is made up of 18 different organizations.¹⁵
Tight coupling - Maybe - Some projects are likely self-contained. Some projects likely have interdependencies. However, it is worth noting that the airman associated with the leaks worked in a group responsible for distilling information down from multiple intelligence sources, which to me suggests less rigorous coupling.
Hierarchical Differentiation - Yes - Again, very hierarchical.
Many decision makers - Yes - Most projects likely require additional review beyond that done by an immediate superior.
Extreme accountability - Maybe - The impacts of bad intelligence are incredibly costly - take, for example, the intelligence report warning of weapons of mass destruction which lead to the Iraq War in 2003. However, intelligence is regularly wrong and so I expect that the consequences to team members for incorrect reports is less severe than, say, a surgeon who may lose their ability to practice after certain surgical errors.
High feedback frequency - No - I am guessing a little bit here but intelligence gathering likely has high variability making it difficult to get regular, consistent feedback on process effectiveness.
Compressed time - Maybe - At times, information needs to be analyzed and compiled on tight turnarounds. However, a fair amount of intelligence collection takes place over months through observation and making connections.
Concurrent critical outcomes - No - More often than not, intelligence is collected to inform decisions with no specific target date and time. Rather, for specific concurrent outcomes, collection is likely limited to predominantly information that has already been gathered.

I generally try and practice deference towards experts. Given that I am not an expert in this area, it is feasible that the U.S. Intelligence community does meet the characteristics of an HRO. To me, it actually looks like it is not particularly close to that qualification. This is an important point to make. You do not need to be an HRO to strive for greater reliability in your team and serve an important role in your industry or sector.

‍

In Conclusion

For any organization, critical errors always lead to difficult questions about the culture and processes that lead to the negative outcome. That is especially true when the error involves classified documents that influence national security. When things end poorly, the common response is for knee-jerk reactions to the opposite extreme. But that is not always the best choice. When we look at High Reliability Organizations, we can see the gold standard for operational error mitigation. For the majority of organizations, we can learn from these standard bearers but our path is more tempered. Reliability is good for all organizations, but not all organizations need to be HROs.

‍

References

‍

The Discord Leaks, HROs, and "Just Right"