---
id: "claim-piracy-financial-risk"
type: "claim"
source_timestamps: ["§ Lessons for Gen AI Companies", "¶14"]
tags: ["financial-risk", "statutory-damages", "copyright-law"]
related: ["concept-piracy-caveat", "concept-shadow-libraries", "prereq-statutory-damages", "entity-anthropic"]
confidence: "high"
testable: true
sources: ["tail2"]
sourceVaultSlug: "hbr-seg-tail2"
originDay: 2
articleStem: "hbr-tail-126-genai-copyright"
sourceUrl: "https://hbr.org/2025/07/can-gen-ai-and-copyright-coexist"
sourceTitle: "Can Gen AI and Copyright Coexist?"
---
# Pirated Training Data Creates Trillion-Dollar Liability

**Claim (confidence: HIGH on the mechanism; the trillion-dollar figure is a theoretical maximum, not a prediction).**

Under 17 U.S.C. §504, copyright holders may recover statutory damages of up to **$30,000 per infringed work**, and up to **$150,000 per work** if the infringement is willful — without proving actual loss (see [[prereq-statutory-damages]]). Because [[entity-judge-william-alsup]] held that obtaining data via piracy is "irredeemably infringing" regardless of fair use (see [[concept-piracy-caveat]]), AI companies face existential exposure. The article's headline arithmetic: [[entity-anthropic-d2]]'s use of 7 million pirated books (see [[concept-shadow-libraries]]) could *theoretically* imply **$1.05 trillion** in statutory damages (7,000,000 × $150,000).

**Enrichment calibration — critical for any downstream answer:** The statutory ranges and the per-work nature of damages are correct. But the trillion-dollar figure is a **pure worst-case upper bound**, not a realistic or observed outcome:
- Courts almost never award the maximum per-work amount across every work and exercise broad equitable discretion.
- Class actions and settlements produce far lower per-work payouts — one estimate of the Anthropic settlement fund implies roughly **~$3,100 per work**; other commentary put theoretical exposure in the **tens of billions** (one figure: >$70 billion), still orders of magnitude below $1.05T.

So: the *directional* point (shadow-library reliance is an unsustainable, potentially existential risk that pushes firms toward licensing) is sound; the *specific* trillion-dollar number should always be labeled a theoretical maximum.
