---
id: "prereq-rag-architecture"
type: "prereq"
source_timestamps: ["00:07:45"]
tags: ["architecture", "data-systems"]
related: ["concept-model-driven-retrieval"]
reason: "Understanding how traditional RAG hardcodes retrieval logic is necessary to grasp why Model-Driven Retrieval is a paradigm shift."
sources: ["s44-claude-mythos"]
sourceVaultSlug: "s44-claude-mythos"
originDay: 44
---
# Retrieval-Augmented Generation (RAG)

## Why this is a prerequisite

Understanding traditional RAG architecture is necessary to grasp why [[concept-model-driven-retrieval]] is described as a paradigm shift.

## What you should already know

Traditional RAG architecture:

1. **Documents** are chunked into passages.
2. **Embeddings** are generated for each chunk and stored in a vector database (Pinecone, Weaviate, pgvector, etc.).
3. **At query time:**
   - User query is embedded
   - Top-k semantically similar chunks are retrieved via cosine/dot-product similarity
   - Retrieved chunks are injected into the LLM prompt as context
4. **The LLM** generates an answer grounded in the retrieved context.

Key engineering decisions humans make in this pipeline:
- Chunking strategy (size, overlap, semantic boundaries)
- Embedding model selection
- Top-k value
- Re-ranking algorithms
- Filtering / metadata logic

## Why this matters for the source

The speaker's argument in [[concept-model-driven-retrieval]] is that all of these human-engineered decisions become *liabilities* with sufficiently capable models. Without grasping the traditional pipeline, the criticism lands flat.

## Suggested background

- OpenAI Cookbook: prompt engineering & embeddings
- LangChain documentation
- Lewis et al. 2020 (original RAG paper)
- Lilian Weng's prompt engineering blog post