Skip to content
AI Risk Library · Automation & agents
The AI risk library

Automation & agents: 15 ways AI gets it wrong

Wrong steps, false success messages and single errors repeated at scale. Each failure mode below is phrased as the question people actually ask, with what it looks like in real work — and the layer of the Trust Stack that catches it.

Agreement alone is not proof

Does an AI agent execute the wrong step?

The agent runs step four when step two was what the task required.

Caught by the Risk Reviewer

Why does an AI agent use the wrong tool?

A search tool is invoked for a job that needed a calculation tool.

Caught by the Risk Reviewer

Can an AI agent call tools it didn't need?

The agent fires off tool calls that add cost but no value.

Caught by the Risk Reviewer

Does an AI agent skip tools it should have used?

The agent guesses an answer instead of looking it up with an available tool.

Caught by the Risk Reviewer

Why does an AI agent act on stale context?

The agent uses an old version of a file that has since changed.

Caught by the Risk Reviewer

Can an AI agent misinterpret a file?

The agent reads a spreadsheet's structure wrong and acts on bad data.

Caught by the Independent Auditor

Does an AI agent mislabel documents?

A file is tagged as the wrong type and routed to the wrong place.

Caught by the Independent Auditor

Why does an AI agent summarize only part of a file?

A summary covers the first pages and silently ignores the rest.

Caught by the Risk Reviewer

Can an AI agent miss attachments?

A critical attached document is never opened or considered.

Caught by the Risk Reviewer

Does an AI agent miss instructions buried in a document?

A key instruction tucked deep in a file is overlooked entirely.

Caught by the Risk Reviewer

Why does an AI agent get distracted by irrelevant content?

The agent chases a tangent in the document instead of the actual task.

Caught by the Risk Reviewer

Can an AI agent lose track of task state?

A multi-step job forgets which steps it has already completed.

Caught by the Risk Reviewer

Does an AI agent reliably verify it finished?

The agent moves on while a step it claimed done was never actually finished.

Caught by the Devil's Advocate

Why does an AI agent report success when it failed?

The agent says "done" on a task it never actually completed.

Caught by the Devil's Advocate

Can an AI agent make errors at scale?

A single wrong assumption is repeated across hundreds of automated actions.

Caught by the Risk Reviewer

One model can’t reliably catch its own mistakes. A council of independent minds can.

Run your work through the council

All 250 failure modes · See also: the Trust Stack