---
id: "quote-eleuther-performance"
type: "quote"
source_timestamps: ["§ Lessons for Gen AI Companies", "¶17"]
tags: ["training-data", "ai-performance"]
related: ["claim-unlicensed-data-performance", "entity-eleuther-ai"]
speaker: "EleutherAI"
speakers: ["EleutherAI"]
sources: ["tail2"]
sourceVaultSlug: "hbr-seg-tail2"
originDay: 2
articleStem: "hbr-tail-126-genai-copyright"
sourceUrl: "https://hbr.org/2025/07/can-gen-ai-and-copyright-coexist"
sourceTitle: "Can Gen AI and Copyright Coexist?"
---
# EleutherAI on Unlicensed Data Necessity

> "the common idea that unlicensed text drives performance is unjustified."

— [[entity-eleuther-ai]] (¶17)

The evidentiary anchor for [[claim-unlicensed-data-performance]] and the contrarian thesis [[contrarian-unlicensed-data-unnecessary]], tied to the 8 TB license-clean Common Pile v0.1 dataset. Treat as a strong-form hypothesis pending the independent benchmarking called for in [[question-unlicensed-data-necessity]].
