---
id: "entity-eleuther-ai"
type: "entity"
source_timestamps: ["§ Lessons for Gen AI Companies", "¶17"]
tags: ["ai-research", "open-source"]
related: ["claim-unlicensed-data-performance", "contrarian-unlicensed-data-unnecessary", "quote-eleuther-performance", "question-unlicensed-data-necessity"]
entityType: "organization"
canonicalName: "EleutherAI"
aliases: ["Eleuther.ai", "Eleuther AI"]
canonical_url: "eleuther.ai"
speakers: ["EleutherAI"]
sources: ["tail2"]
isSpeakerEntity: true
---
## Segment 2 — tail2

## Article 126 — a126

# EleutherAI

An open-source AI-research collective, known for The Pile dataset and the GPT-Neo(-X) family of models, and one of the source's cited voices.

EleutherAI released **Common Pile v0.1**, an 8 TB dataset composed entirely of open-source or licensed content, and reported that models trained on it performed comparably to those trained on unlicensed copyrighted data — challenging the industry consensus on data requirements (see [[claim-unlicensed-data-performance]], [[quote-eleuther-performance]]). Its findings are the empirical basis for the contrarian argument in [[contrarian-unlicensed-data-unnecessary]] and feed the open question [[question-unlicensed-data-necessity]].

**Role in the source:** cited as an authority for the claim that clean/licensed data can rival unlicensed corpora. **Enrichment note:** the license-clean purpose of Common Pile is well documented, but the performance-parity result is preliminary and largely self-reported, awaiting independent peer-reviewed benchmarking. **Attributed quote:** [[quote-eleuther-performance]].