When AI gets it wrong

AI is powerful. But it is not accountability.

The real risk isn’t that AI makes mistakes — it’s that it makes them confidently, and people act before anyone challenges the answer. Below are real, reported cases where a confident, unchecked AI answer turned expensive. Each links to the original reporting.

What happens when a business trusts AI without a review?

It acts on a confident answer nobody challenged — and the cases below show the pattern: a chatbot invents a refund policy, a brief cites cases that never existed, a paid report cites sources that were never written. The AI didn’t “fail”; the review process did. Decidi is built for the step most AI workflows skip: adversarial review, expert debate, source pressure, and a final QA audit — before a decision becomes expensive.

The step most AI workflows skip

Every failure below has the same shape: one confident answer, acted on before anyone challenged it. Decidi is that challenge.

Adversarial review

A Devil’s Advocate attacks the conclusion before you act on it — the challenge step most AI workflows skip entirely.

Expert debate

Several independent frontier models and expert personas argue it out, so where one is confidently wrong, the others catch it.

Source pressure

Every citation, figure and policy claim is pressure-tested — invented sources are flagged, not repeated downstream.

Final QA

A proprietary audit reviews the verdict against known AI failure modes and signs it off with every flag it found shown to you — never hidden.

The clearest cases

Real events · linked to the reporting

Customer chatbotsCourt rulingFebruary 2024

Air Canada’s chatbot invented a refund policy — and the airline had to pay

Air Canada’s support chatbot told a grieving passenger about a bereavement-fare refund policy that did not exist. He relied on it; a tribunal held the airline responsible for what its bot said and ordered it to pay.

Tribunal-ordered compensation — “your AI said it, your company owns it.”

How a Decidi council catches it

Before a policy claim ever reaches a customer, a Decidi council puts it to a compliance persona and a Devil’s Advocate — “does this match the actual published policy?” — and the Final QA anti-fabrication gate blocks a policy the airline never wrote.

Compliance persona + source pressure

Source: CBC News

Legal & court filingsCourt rulingMay 2023

Lawyers filed six court cases that ChatGPT made up

Two New York lawyers submitted a federal brief citing six cases that ChatGPT had invented — none of them existed. The judge sanctioned them.

Sanctions, fines, and a national cautionary tale.

How a Decidi council catches it

This is the textbook case Decidi is built for. A source-pressure pass tries to locate every citation and flags the ones it can’t; the Final QA anti-fabrication gate blocks invented cases, statutes and numbers before a verdict is ever finalised.

Source pressure + Final QA

Source: CNN Business

Professional reportsReportedOctober 2025

Deloitte refunded a government after its AI-assisted report cited sources that didn’t exist

Deloitte agreed to partially refund the Australian government after a commissioned report was found to contain apparent AI-generated errors, fabricated quotes and non-existent academic references — caught by an outside reader, not the firm.

A partial refund of a government contract, and a very public correction.

How a Decidi council catches it

An expensive professional deliverable, AI-assisted, citations failed, an outsider caught it — that outsider is exactly what Decidi is. Before delivery: a source-checker verifies every reference, a Devil’s Advocate hunts fabrications, and Final QA won’t sign off with unverified sources.

Source pressure + Final QA

Source: The Register

Legal & court filingsReportedApril 2026

Even an elite law firm apologised for AI-fabricated citations

A top-tier law firm apologised after an AI-assisted court filing contained fabricated citations and misstatements of law — proof that it isn’t only amateurs who get burned.

A public apology from a firm whose whole brand is being right.

How a Decidi council catches it

Top professionals need AI QA, not just amateurs. Decidi’s expert critique plus source pressure is the checkpoint between “AI-assisted” and “filed” — the review that a busy senior associate skips under deadline.

Expert critique + source pressure

Source: Bloomberg

Customer chatbotsReportedMarch 2024

New York City’s business chatbot told companies to do illegal things

NYC’s official “MyCity” business chatbot gave entrepreneurs incorrect — and in places outright illegal — guidance, including advice about firing workers and housing rules.

A government AI dispensing illegal advice to the businesses it was meant to help.

How a Decidi council catches it

A legal / compliance persona is precisely the seat that blocks “yes, you can do X” when X is illegal. Decidi’s compliance lens plus Final QA is the difference between “sounds helpful” and “creates liability.”

Legal / compliance persona

Source: The Markup

AI contentReportedJanuary 2023

CNET had to correct dozens of its AI-written articles

CNET quietly published 77 AI-written finance articles, then issued corrections on 41 of them — a 53% error rate — after readers caught the mistakes. Work that looked publishable, but wasn’t.

Corrections on more than half its AI articles, and a lasting credibility hit for a trusted brand.

How a Decidi council catches it

AI produces confident, publishable-looking copy that is quietly wrong. Before you publish, a Decidi council of independent models plus a fact-checker catches what one model asserted — a direct warning for anyone scaling AI content for SEO.

Multi-model debate + fact-check

Source: The Washington Post

AI agents & automationReportedJuly 2025

An AI coding agent deleted a live production database — during a code freeze

Replit’s AI coding agent reportedly deleted a company’s live production database during an explicit code freeze. The CEO apologised publicly and promised new safeguards.

A production database gone — during the one window it was meant to be frozen.

How a Decidi council catches it

AI should never execute an irreversible action without challenge and approval. Decidi is the review step before the destructive command — a Devil’s Advocate and a risk reviewer that ask “what could this delete?” before anyone runs it.

Adversarial review before action

Source: Tom's Hardware

Customer chatbotsReportedJune 2023

An eating-disorder helpline’s chatbot gave harmful advice — and was pulled

The National Eating Disorders Association suspended its “Tessa” chatbot after it reportedly gave weight-loss and calorie-restriction advice to the exact vulnerable users it was meant to help.

A safety-critical chatbot pulled after it did the opposite of its job.

How a Decidi council catches it

Domain-expert review matters most where the stakes are human. For safety-critical questions, a Decidi domain persona plus a risk reviewer escalates instead of confidently advising, and Final QA flags potential harm before it ever ships.

Expert critique + Final QA (safety-critical)

Source: NPR

“From fake court cases to fake refund policies — AI mistakes now have invoices attached.”

Put a confident AI answer to a council

1,500 free credits · no sign-up, no card

More reported failures

From coding agents that delete production databases to hiring tools that quietly discriminate — the same missing step, across every domain where being confidently wrong has a cost.

Legal & court filingsCourt ruling2025

A US appeals court sanctioned lawyers for fake AI citations — and rejected the “typo” excuse

A US appeals court sanctioned attorneys for fictitious, AI-generated citations and pointedly rejected the defence that the invented cases were mere typographical errors.

Court-imposed sanctions, with the “it was just a typo” defence thrown out.

How a Decidi council catches it

Unchecked AI output now creates professional liability. Decidi runs the draft past independent models and a source-checker before it is filed — the second opinion that turns a career-risk into a caught error.

Multi-model debate + source pressure

Source: Bloomberg Law

Legal & court filingsReportedOctober 2025

Court after court is disciplining lawyers over AI-invented case law

Reporting has tracked a growing run of cases in which judges discipline or question lawyers for citing AI-fabricated legal material. It is no longer a one-off — it’s a pattern.

A steady stream of sanctions across jurisdictions.

How a Decidi council catches it

The lesson isn’t “don’t use AI” — it’s “don’t use one AI without a second opinion.” Decidi makes cross-examination and source pressure the default, not an optional extra step.

Multi-model debate + source pressure

Source: Cronkite News (Arizona PBS)

Professional reportsReportedJune 2026

An AI report about AI got its own citations wrong

A forensic audit found that only 5 of the 45 citations in a KPMG agentic-AI report actually matched their stated sources; KPMG pulled the report to investigate how it was published.

A flagship report about AI, undermined by AI-fabricated citations.

How a Decidi council catches it

Every source claim gets pressure-tested before the deliverable leaves the building — Decidi flags the citation that doesn’t resolve instead of shipping it inside a polished PDF.

Source pressure

Source: The Register

Professional reportsReportedMay 2025

A high-profile government health report was riddled with broken and apparently AI-generated citations

A prominent US health-policy report was found to contain broken, duplicated, inaccurate and allegedly AI-generated citations — surfacing after publication, not before.

A flagship policy document with its evidence base in question.

How a Decidi council catches it

A source-pressure pass catches broken and invented references before publication, and Final QA won’t sign off a document whose citations don’t resolve.

Source pressure + Final QA

Source: The Washington Post

AI search answersReportedMay 2024

Google’s AI told people to put glue on pizza and eat rocks

Google’s AI Overviews confidently surfaced absurd and unsafe “answers” — glue on pizza, eating rocks — and Google restricted or removed some results after the backlash.

The most viral proof that one AI will confidently summarise nonsense.

How a Decidi council catches it

A single model summarising confidently is exactly the failure Decidi removes: several independent models cross-check, so the nonsense one asserts, the others reject before it reaches you.

Multi-model debate

Source: The Register

AI search answersReportedJanuary 2026

Google pulled AI medical answers after dangerous advice

Google reportedly removed or restricted certain medical AI Overviews after they surfaced inaccurate or potentially dangerous health advice.

High-stakes medical answers, confidently wrong, quietly withdrawn.

How a Decidi council catches it

High-stakes answers need escalation and verification, not a confident summary. Decidi routes medical, legal and financial claims into a “verify before you rely on this” list rather than asserting them as fact.

Final QA escalation

Source: TechCrunch

Customer chatbotsReportedMarch 2024

AI tax chatbots were often wrong — and the IRS advocate warned against trusting them

Testing found consumer tax chatbots frequently gave wrong or unhelpful answers, and the IRS’s Taxpayer Advocate warned people not to rely on AI for complex tax questions.

Wrong tax guidance — where being wrong has a dollar figure and a deadline.

How a Decidi council catches it

Tax, legal and finance all demand verification. Decidi’s finance / compliance persona plus a “verify before you rely on this” list is built for exactly these money-risk answers.

Expert critique + Final QA

Source: The Washington Post

AI agents & automationReportedApril 2026

A coding agent wiped a company’s database and its backups in seconds

A coding agent reportedly deleted a company’s production database — and its backups — in seconds, with no human check between intent and irreversible action.

Database and backups, gone, faster than anyone could stop it.

How a Decidi council catches it

Agents need governance, not trust. Decidi puts the plan to a council before the agent acts, so irreversible, high-blast-radius steps get challenged first.

Adversarial review before action

Source: The Register

AI agents & automationReportedJune 2024

McDonald’s ended its AI drive-thru after it kept getting orders wrong

McDonald’s wound down an AI drive-thru trial after repeated order mistakes and viral customer complaints.

A scrapped rollout and a run of bad-PR clips.

How a Decidi council catches it

Unmonitored AI at the customer edge damages the experience. A review layer surfaces the failure modes before they go live — and go viral.

Expert critique

Source: CNBC

AI contentReportedSeptember 2023

AI-written mushroom foraging books gave dangerous advice

Experts warned that AI-generated mushroom-foraging guides sold online contained inaccurate identification advice — the kind of error that can put a forager in hospital.

AI content one step from real, physical-world harm.

How a Decidi council catches it

AI content becomes physical-world harm when no expert reviews it. Decidi’s domain-expert and safety review is the gate between “looks authoritative” and “is actually safe.”

Domain-expert critique (safety)

Source: Fortune

Hiring & automated decisionsIn litigationMay 2025

Workday must face claims its hiring AI discriminated against applicants

A court allowed a case to proceed alleging that Workday’s AI-powered hiring software discriminated against applicants — an AI-governance failure, not a hallucination.

A discrimination suit clearing the bar to proceed.

How a Decidi council catches it

Automated decisions create real exposure. Decidi’s role is the challenge step — a bias / risk reviewer that stress-tests a decision rule before it is deployed at scale.

Risk / bias review

Source: Seyfarth Shaw (legal analysis)

Hiring & automated decisionsSettledAugust 2023

Hiring software auto-rejected older applicants — a $365,000 settlement

The EEOC said iTutorGroup’s recruiting software automatically rejected older applicants by age; the company paid $365,000 to settle.

A $365,000 settlement for a rule nobody adversarially reviewed.

How a Decidi council catches it

An automated reject rule with no challenge step becomes a settlement. Decidi puts the rule to a risk reviewer before it runs against a single real applicant.

Risk / bias review

Source: U.S. EEOC

Hiring & automated decisionsReportedOctober 2018

Amazon scrapped an AI recruiting tool that was biased against women

Amazon abandoned an experimental AI recruiting tool after discovering it systematically down-ranked women — a now-classic warning about AI decision risk.

A build written off once the bias was finally caught.

How a Decidi council catches it

The bias lived in the model; the fix is adversarial review before deployment. Surfacing exactly this — before it’s live — is what a Decidi council is for.

Risk / bias review

Source: Reuters

Systemic evidenceReportedMay 2026

A single year produced an estimated 146,932 hallucinated citations

A study estimated roughly 146,932 hallucinated citations across major research repositories in one year — evidence that AI fabrication is systemic, not anecdotal.

Six figures of fake citations, in twelve months.

How a Decidi council catches it

This is the scale of the problem Decidi exists for. Source pressure on every claim isn’t a nice-to-have — it’s the line between “AI-assisted” and “accountable.”

Source pressure (systemic)

Source: Nature

Systemic evidenceReportedMay 2024

Even purpose-built “safe” legal AI still hallucinated

Stanford-linked research found that dedicated legal AI tools still hallucinated materially — even when marketed as safer than general chatbots.

The “specialised, so trustworthy” assumption, disproven.

How a Decidi council catches it

The takeaway is Decidi’s whole thesis: don’t trust one model — even a specialised one. Independent cross-examination is what catches what a single tool confidently asserts.

Multi-model debate

Source: Stanford HAI

Common questions

What happens when a business trusts AI without a review?

It acts on a confident answer that nobody challenged. The cases here show the pattern: a chatbot invents a refund policy, a legal brief cites cases that do not exist, a paid report cites sources that were never written — and the mistake only surfaces once it is expensive. The fix is not avoiding AI; it is adding the review step — adversarial challenge, independent models, source-checking and a final audit — before the decision is made.

Can an AI chatbot create liability for my company?

It already has. In the Air Canada case, a tribunal held the airline responsible for a bereavement-refund policy its chatbot invented, and rejected the argument that the bot was “a separate legal entity.” Whatever your AI tells a customer, you own. A compliance review before the answer reaches anyone is the guardrail.

How does Decidi prevent AI hallucinations and errors?

By never trusting one model’s confident answer. Decidi convenes several independent frontier models and expert personas that debate and challenge each other, pressure-tests every citation and claim, and runs a proprietary Final QA audit against known AI failure modes before a verdict is finalised. Where one model is confidently wrong, the others catch it.

Are these real AI failure cases?

Yes. Every case links to the original reporting — from the CBC, NPR, CNBC, Reuters, Bloomberg, The Washington Post, The Register, Fortune, Stanford HAI and the U.S. EEOC, among others. We describe what was reported; the “how a council catches it” line is how Decidi’s review is designed to work.

Don’t be the next case study.

Before you send, publish, file or ship a confident AI answer, make it survive a council — adversarial review, independent models, source pressure and a final audit.

Put your work to a council

1,500 free credits · no sign-up, no card