5 Whys is Powerful. You Are Just Doing It Wrong

The 5 Whys method has been part of engineering culture for decades. It is simple. Keep asking "why" until you find the reason something broke. It sounds foolproof.

But in most postmortems the 5 Whys is done poorly. It stops too early. It focuses on one narrow thread. It turns into a hunt for someone to blame instead of a way to uncover the truth. The result is a shallow report that fixes a symptom while the deeper issues remain untouched.

When used properly, the 5 Whys is one of the most effective tools you can have for learning from incidents. When used poorly, it is cargo cult RCA.

The Three Most Common Failures With 5 Whys

  1. Stopping after two questions
    Teams often ask why once or twice and then stop as soon as they hear something that feels like a cause. That is rarely deep enough.
  2. Asking why on only one branch
    Incidents have multiple contributing factors. Teams often pick one path and ignore the others.
  3. Turning why into who
    When the focus shifts to people instead of systems, the process becomes about blame instead of improvement.

What Proper Facilitation Looks Like

To get real value out of 5 Whys, treat it like an exploration rather than a checkbox.

1. Start with a clear statement of the problem

Be specific. For example:
"Database queries from the checkout service timed out for 45 minutes."

2. Ask why across different aspects of the incident

Instead of following a single thread, explore each phase:

This approach ensures that you discover both technical and process factors.

3. Map it, do not just write a list

Draw a branching diagram. Each answer can lead to multiple new questions. By the end, you will have a tree of contributing factors rather than a single chain.

4. Focus on systems, not individuals

Ask why conditions existed that allowed the problem to occur.
What made it possible? What guardrails were missing? What could have prevented this?

This is how you shift from pointing fingers to creating resilience.

Why This Matters

If you only use 5 Whys to find the first thing that broke, you will never see the patterns that caused it. Proper facilitation forces teams to see the environment that made the failure possible.

Shallow RCAs produce shallow actions. Deep RCAs produce systemic improvements.

Tools Can Make This Easier

Modern incidents scatter context across Slack, Zoom, PagerDuty and a dozen other places. Gathering this context manually and building a map of why is time consuming. By the time someone starts the 5 Whys exercise, much of the signal has already been lost.

Tools like COEhub can collect all of that data for you, build a timeline, and give you a guided structure for asking the right questions. This lets engineers spend their energy thinking instead of searching.

Closing Thought

5 Whys works when it is done thoroughly and honestly. It fails when it becomes a ritual.

The next time you run one, do not settle for the first cause you see. Keep digging. Look for the conditions. Look for the patterns. That is where the real learning begins.