---
id: "concept-gen-ai-hallucinations"
type: "concept"
source_timestamps: ["§ Behavioral Change"]
tags: ["ai-limitations", "quality-control", "statistical-models"]
related: ["concept-behavioral-change-gen-ai", "claim-human-over-trust-ai"]
definition: "The phenomenon where generative AI produces false or nonsensical information, fundamentally driven by bad predictions within its underlying statistical model."
sources: ["spine"]
sourceVaultSlug: "hbr-seg-spine"
originDay: 1
articleStem: "hbr-cl-95-6-disciplines-genai"
sourceUrl: "https://hbr.org/2024/07/the-6-disciplines-companies-need-to-get-the-most-out-of-gen-ai"
sourceTitle: "The 6 Disciplines Companies Need to Get the Most Out of Gen AI"
---
# Gen AI Hallucinations as Statistical Bad Predictions

The authors deliberately **reframe** the common, anthropomorphic term *hallucinations* as simply "**bad predictions by these statistical models**" — see the source quote [[quote-hallucinations-bad-predictions]]. LLMs are probabilistic next-token predictors, not reasoning engines; understanding this requires the background in [[prereq-llm-mechanics-d1]].

Because of this inherent flaw in how LLMs generate text, a **universal behavioral requirement** across all roles is that humans must rigorously review AI-generated output — a core part of [[concept-behavioral-change-gen-ai]].

However, this requirement **contradicts natural human inclination**: studies show a strong tendency for users to accept AI output without editing, making the enforcement of review processes a critical organizational discipline. The supporting evidence is [[claim-human-over-trust-ai]] (an MIT study found 68% of participants chose not to edit an LLM's output).

Enrichment nuance: technical literature from OpenAI, Anthropic, and academic work on truthfulness/faithfulness characterizes hallucinations as failures of next-token prediction grounded in training data and architecture — consistent with the "bad predictions" framing. **Counter-perspective:** some researchers argue hallucinations are not *just* bad predictions but a *structural* phenomenon tied to objective misalignment (the model is trained to predict fluent text, not factual truth). Under that view, human review is necessary but insufficient — technical alignment work is also central.
