---
id: "contrarian-data-removal-possible"
type: "contrarian-insight"
source_timestamps: ["§ Lessons for Rightsholders", "¶9"]
tags: ["contrarian-insight", "technical-misconception", "data-management"]
related: ["concept-model-retraining-removal", "action-demand-retrain-removal"]
challenges: "The widespread assumption that once data is ingested into an LLM's neural network it is permanently 'baked in' and impossible to extract or unlearn."
sources: ["tail2"]
sourceVaultSlug: "hbr-seg-tail2"
originDay: 2
articleStem: "hbr-tail-126-genai-copyright"
sourceUrl: "https://hbr.org/2025/07/can-gen-ai-and-copyright-coexist"
sourceTitle: "Can Gen AI and Copyright Coexist?"
---
# Contrarian: LLM Data Removal Is Technically Possible

**Contrarian insight.** This note challenges the widespread assumption that once data is absorbed into an LLM's weights it is permanently "baked in" and impossible to extract or unlearn.

The counter-argument rests on the LLM training lifecycle (see [[prereq-llm-training-lifecycle]]): major model generations are typically **retrained from scratch** on a freshly assembled corpus rather than incrementally patched. That means a specific rightsholder's works can be *excluded* from the next generation's corpus — the mechanism detailed in [[concept-model-retraining-removal]] and operationalized in [[action-demand-retrain-removal]].

**Balancing view (from enrichment):** ML researchers note neural representations are highly entangled, so removing a specific work's *influence* from an already-trained model is hard; the machine-unlearning literature offers partial rather than guaranteed solutions. The strong form of this contrarian claim is therefore best stated as: *removal is feasible at the corpus level during a from-scratch retrain*, which is materially different from *unlearning a work from an existing deployed model*. Courts ordering destruction of pirated libraries and derivative datasets (as in the Anthropic settlement) give the strategy real legal grounding.
