---
id: "prereq-llm-training-lifecycle"
type: "prereq"
source_timestamps: ["¶9"]
tags: ["machine-learning", "technical-knowledge"]
related: ["concept-model-retraining-removal"]
reason: "Knowing the difference between fine-tuning an existing model and training a base model from scratch is necessary to understand why data removal is only feasible during major version updates."
sources: ["tail2"]
sourceVaultSlug: "hbr-seg-tail2"
originDay: 2
articleStem: "hbr-tail-126-genai-copyright"
sourceUrl: "https://hbr.org/2025/07/can-gen-ai-and-copyright-coexist"
sourceTitle: "Can Gen AI and Copyright Coexist?"
---
# LLM Training Lifecycle

**Prerequisite knowledge.** LLM development distinguishes *pretraining* a base model from scratch on a large corpus, *fine-tuning* an existing model on additional data, and *inference*. Major model generations are typically pretrained anew on a freshly assembled corpus rather than incrementally patched.

**Why it matters here:** this distinction is what makes corpus-level data removal feasible only at major-version boundaries — the mechanism in [[concept-model-retraining-removal]] and the basis for [[action-demand-retrain-removal]]. The adjacent field of **machine unlearning** studies removing a specific example's influence *without* full retraining, but offers only partial guarantees at scale.
