---
id: "question-backend-hygiene"
type: "open-question"
source_timestamps: ["00:15:05", "00:15:16"]
tags: ["data-engineering", "model-limitations"]
related: ["concept-production-trust", "framework-data-migration-pipeline", "action-implement-human-validation"]
resolutionPath: "Observing if future models (GPT-6) natively handle enum normalization, or if the industry settles on using deterministic code generated by the LLM to handle these steps."
sources: ["s26-gpt55-claude-gemini"]
sourceVaultSlug: "s26-gpt55-claude-gemini"
originDay: 26
---
# How will models solve backend data hygiene?

## Question
How will frontier models eventually solve **backend data hygiene** — the boring, structural work of enum normalization, service code preservation, and canonical job grouping?

## Context
[[entity-gpt-5-5|GPT-5.5]] caught **semantically obvious traps** (Mickey Mouse customers, $25,000 fake payment — see [[claim-gpt-5-5-caught-traps]]) but **still failed at boring backend hygiene**. This is a structural blind spot, not a one-off miss.

## Two Resolution Paths
1. **Native model improvements** — GPT-6 / future Claude generations may natively handle enum normalization through better training signals or specialized fine-tuning.
2. **Deterministic code generation** — the industry may settle on having LLMs *write deterministic Python/SQL* that handles these steps rather than handling them directly. This is the safer path operationally and aligns with [[action-implement-human-validation]].

## Why It Matters
Until resolved, [[concept-production-trust]] requires human-in-the-loop validation around every data migration ([[framework-data-migration-pipeline]] step 5: Audit UI).
