Most postmortems are written to be filed, not read.
They satisfy a process requirement, get linked in Slack once, and then join a long list of zombie documents in Confluence. Everyone involved knows this. Senior engineers have lived through it enough times to be numb.
The failure is not effort. Teams spend hours reconstructing timelines, debating phrasing, and negotiating tone. The failure is that the document is treated as the learning system, when in reality it is only an artifact of one.
If you want people to read postmortems, and more importantly to act on them, you need to change what you optimize for.
This post is about doing that.
Most postmortem templates are designed to answer the question: "Did we fill in all the sections?"
Very few are designed to answer: "What will someone learn from this three months from now?"
As a result, they become long, defensive, and oddly sterile. They capture everything, but highlight nothing. They explain the incident, but do not make the next one less likely.
People do not skip postmortems because they are lazy. They skip them because the signal-to-noise ratio is low and the cost of reading is high.
If you want postmortems to be read, every section needs to earn its place.
Before you start writing, answer two questions. You only need two.
That is it.
Many templates add a third question like "What would I want to see if I were reading this later?" which sounds helpful but usually overlaps with the first two. If you know the audience and the decision, the rest follows.
A postmortem written for oncall engineers next quarter looks very different from one written for leadership assessing systemic risk. Trying to satisfy everyone is how you end up satisfying no one.
If you cannot name the reader and the outcome, do not write the document yet.
If someone reads only one thing, it should be the TLDR.
Not a teaser. Not a vague summary. A real compression of the incident and the learning.
Use this structure. Keep it short.
What happened
One or two sentences describing the failure mode and scope.
Impact
Concrete user impact, duration, and severity. Numbers if you have them.
Key contributing factors
Three to five bullets. These should already hint at patterns, not just incident-specific details.
What changes
The specific actions or decisions that will be different because of this incident.
If you cannot write a TLDR like this, the rest of the postmortem will not save you.
Narrative is not the enemy. Humans make sense of systems through stories.
But the primary job of a postmortem is not storytelling. It is forensic clarity.
That means precision before prose.
A good timeline answers three questions quickly: what happened, when, and what we knew at the time.
Use a table format with four columns:
This last column matters more than most teams realize. Decisions under uncertainty are where learning lives. Not in hindsight explanations.
If you want narrative, add a short synthesis section after the timeline that explains how these signals interacted. Do not bury the facts inside the story.
Language matters.
Talking about "the real cause" subtly reinforces the idea that incidents have a single underlying failure waiting to be uncovered. In complex systems, that mental model breaks down quickly.
What you want instead is a map of contributing factors.
Drop the RCA acronym entirely if you can. It carries too much historical baggage.
Ask: "What conditions made this incident possible, harder to detect, or harder to recover from?"
Then group the answers:
This framing makes it easier to see patterns across incidents later. It also reduces the gravitational pull toward blaming individuals.
5 Whys is not the problem. Stopping early is.
The moment you land on "an engineer made a mistake," you are not done. You have reached a symptom.
Bad stopping point:
"The outage happened because an engineer misconfigured the cache."
Dig again:
Why was a risky change possible during peak traffic?
Why was there no guardrail or staging signal that would catch it?
Why did the system rely on tribal knowledge for a high-impact operation?
Now you are learning about system design, not individual performance.
Structure your 5 Whys with categories:
You do not need to go five levels every time. You do need to go far enough that the answer no longer names a person.
Action items are not proof of learning. Follow-through is.
Fewer action items is usually better. Each one should be:
If an action item does not change a future decision or constraint, question why it exists.
And if you are not going to track it, be honest.
Which leads to the hardest advice in this post.
A postmortem with no follow-up loop is theater.
If action items are not reviewed later, if patterns are not revisited, if ownership quietly evaporates, the document becomes worse than useless. It creates the illusion of learning.
In that case, skip the postmortem and invest the time elsewhere.
The goal is not documentation. The goal is reduced recurrence.
Humans are good at judgment, synthesis, and sense-making. They are bad at exhaustive reconstruction and long-term memory.
Where possible, let systems handle:
This is the gap most teams feel but cannot quite name. The learning dies not because people stop caring, but because memory decays and context disappears.
COEhub treats postmortems as inputs to a learning system by automatically extracting timelines, clustering contributing factors across incidents, and tracking whether actions actually reduce recurrence, instead of letting each document die in isolation.
That is the difference.
Before publishing a postmortem, check this list:
If the answer to most of these is no, the document will not be read. And even if it is, it will not matter.
The question is not whether your team writes postmortems.
It is whether your system remembers.