Briefs · engineering

Incident Post-Mortem

Run a blameless post-mortem that finds the real systemic cause.

You walk away with

A blameless post-mortem with root causes and concrete preventions.

Decidi convenes

🛰️ The Reliability Engineer 🔄 The Systems Thinker 🏛️ The Software Architect 🪦 The Pre-Mortem Analyst 🔬 The Data Skeptic 👥 The People Lead

Recommended level: Standard — Proven pro models — the everyday default.

What the council debates

Help us run a blameless post-mortem on an incident and find the systemic causes, not a scapegoat.

THE INCIDENT:
[what happened, the timeline, the impact on users and the business]
DETECTION & RESPONSE: [how we found out, how long to detect/mitigate/resolve]
WHAT WE THINK WENT WRONG: [the current theory]

Debate:
1. The real root cause(s) — push past the first explanation to the systemic one (the five whys).
2. Why detection took as long as it did, and what signal we were missing.
3. Why the blast radius was as large as it was, and how to contain it next time.
4. The contributing factors — process, tooling, ownership, on-call load — not just the trigger.
5. Which proposed fixes are real prevention versus which are theatre.

FINAL SYNTHESIS:
- A blameless statement of the root cause(s).
- A timeline of what would have caught or contained it earlier.
- A ranked list of preventions and detection improvements, each with an owner and the risk it removes.

Run this brief as a council

Related briefs

Code Architecture Review

Pressure-test a system design before you commit to it.

Security Threat Model

Red-team a system to find how an attacker would actually break it.

Tech-Stack Selection

Choose the right stack without falling for hype or sunk cost.

Build vs Buy Decision

Decide whether to build it, buy it, or partner — with eyes open.