Skip to content
Briefs · engineering

Incident Post-Mortem

Run a blameless post-mortem that finds the real systemic cause.

You walk away with

A blameless post-mortem with root causes and concrete preventions.

What the council debates
Help us run a blameless post-mortem on an incident and find the systemic causes, not a scapegoat.

THE INCIDENT:
[what happened, the timeline, the impact on users and the business]
DETECTION & RESPONSE: [how we found out, how long to detect/mitigate/resolve]
WHAT WE THINK WENT WRONG: [the current theory]

Debate:
1. The real root cause(s) — push past the first explanation to the systemic one (the five whys).
2. Why detection took as long as it did, and what signal we were missing.
3. Why the blast radius was as large as it was, and how to contain it next time.
4. The contributing factors — process, tooling, ownership, on-call load — not just the trigger.
5. Which proposed fixes are real prevention versus which are theatre.

FINAL SYNTHESIS:
- A blameless statement of the root cause(s).
- A timeline of what would have caught or contained it earlier.
- A ranked list of preventions and detection improvements, each with an owner and the risk it removes.