---
id: "open-question-digital-twin-training"
type: "open-question"
source_timestamps: ["§ The Road Ahead"]
tags: ["synthetic-data", "predictive-accuracy"]
related: ["concept-synthetic-personas", "entity-columbia-business-school", "entity-gbk-collective", "entity-twinloop"]
resolutionPath: "The newly launched study by GBK Collective, Columbia Business School, and Twinloop aims to rigorously validate the link between specific data inputs/modalities and the predictive accuracy of the resulting digital twins."
sources: ["commercial"]
sourceVaultSlug: "hbr-seg-commercial"
originDay: 5
articleStem: "hbr-new-30-ai-scale-customer-research"
sourceUrl: "https://hbr.org/2026/04/how-ai-helps-scale-qualitative-customer-research"
sourceTitle: "How AI Helps Scale Qualitative Customer Research"
---
# Optimal Training Data for Digital Twins

**Open question.** While deep qualitative data is known to be critical for training the next generation of digital twins, it remains unknown exactly *what* mix of training data and *which* survey/interview modalities yield the most **empirically accurate** digital twins.

Central to [[concept-synthetic-personas]].

**Resolution path.** The newly launched study by [[entity-gbk-collective]], [[entity-columbia-business-school]], and [[entity-twinloop]] aims to rigorously validate the link between specific data inputs/modalities and the predictive accuracy of resulting digital twins.

**Enrichment framing.** A domain expert would stress that this validation must include **backtesting against real behavior**, guard against **preference drift** (twins frozen on stale data) and **feedback loops** (decisions driven by synthetic personas reshaping the market), and treat digital twins as **experimental decision-support**, not authoritative proxies — plus consent/governance for long-lived, sensitive qualitative datasets.
