---
id: "concept-unstructured-data-management"
type: "concept"
source_timestamps: ["§ Data Management"]
tags: ["data-infrastructure", "data-curation", "unstructured-data"]
related: ["entity-epic", "entity-microsoft-nuance"]
part_of: "framework-6-disciplines-gen-ai"
definition: "The organizational capability to collect, store, and curate unstructured data (text, images, voice) to fuel and train generative AI applications."
sources: ["spine"]
sourceVaultSlug: "hbr-seg-spine"
originDay: 1
articleStem: "hbr-cl-95-6-disciplines-genai"
sourceUrl: "https://hbr.org/2024/07/the-6-disciplines-companies-need-to-get-the-most-out-of-gen-ai"
sourceTitle: "The 6 Disciplines Companies Need to Get the Most Out of Gen AI"
---
# Unstructured Data Management for Gen AI

Discipline #4 of the [[framework-6-disciplines-gen-ai|six disciplines]]. Where traditional analytics relied on **structured, numerical** data, generative AI **thrives on and creates unstructured data** — text, images, voice. Most organizations lack the processes to collect, store, and curate this content.

Mastering this discipline requires:
- **Augmenting work environments to capture new data streams** — e.g., outfitting examining rooms to capture clinical notes via voice.
- **Data curation** — evaluating unstructured content for **importance, uniqueness, and currency**.
- Possibly **training content providers to curate their own data**, or **forming new partnerships** to gather previously discarded information.

The cited example: [[entity-epic]] (electronic health records) partnered with [[entity-microsoft-nuance]] to add Gen AI capabilities for capturing and summarizing clinical notes — a canonical unstructured-data-management play. This data foundation is a prerequisite for the strategic redesign called for in [[concept-systems-thinking-ai]].

Enrichment nuance: enterprise AI references (Snowflake, Databricks, major cloud providers) frame unstructured-data pipelines, vector stores, and governance as core Gen AI enablers; ambient clinical intelligence (voice-to-text-to-record) is well documented. **Counter-perspective:** some organizations (media companies, research institutions) already have strong unstructured-data practices — the gap is not universal — and over-zealous data capture without governance creates privacy, compliance, and security risks.


## Related across articles
- [[concept-data-flywheels]]
