---
id: "concept-human-formatted-data"
type: "concept"
source_timestamps: ["§ Information Systems"]
tags: ["data-architecture", "knowledge-management"]
related: ["action-convert-to-markdown", "concept-agent-first-rewiring"]
definition: "Data stored in visual or formatted mediums (PDFs, slide decks, complex websites) that are optimized for human eyes but create friction for AI agents."
sources: ["agentic"]
sourceVaultSlug: "hbr-seg-agentic"
originDay: 6
articleStem: "hbr-ext-17-workplace-set-up-for-agents"
sourceUrl: "https://hbr.org/2026/01/is-your-workplace-set-up-for-ai-agents"
sourceTitle: "Is Your Workplace Set Up for AI Agents?"
---
# Human-Formatted Data Silos

For decades organizations have encoded knowledge in formats optimized for human visual consumption: websites with complex layouts, PDFs with formatted tables, slide decks with charts, and documents with headers and bullet points. Humans navigate these easily, but they are severe friction points for machines. When data is siloed across SharePoint folders, HR portals, and PDF repositories, an AI agent struggles to synthesize it.

Ju's rule of thumb: PDFs and formatted documents should be treated strictly as outputs for human reading, not as the source of truth or the storage medium for organizational knowledge (see [[quote-pdfs-are-outputs]]). The prescribed fix is [[action-convert-to-markdown|converting institutional knowledge to plain-text markdown]] stored in searchable directories, which the author frames as the [[claim-markdown-highest-leverage|highest-leverage immediate change]] most organizations can make. This is the data pillar of [[concept-agent-first-rewiring|agent-first rewiring]] and the target of the contrarian stance that [[contrarian-pdfs-are-harmful|PDFs and slide decks are harmful storage formats]].

**Enrichment nuance:** AI systems ingest PDFs/slides via parsing or OCR, which introduces errors versus clean text; RAG research shows well-curated text corpora dramatically improve LLM reliability over ad hoc PDF parsing. A balanced expert view is that these formats are not inherently harmful if backed by machine-readable mirrors — they are poor *canonical* sources, not poison.


## Related across articles
- [[concept-brand-code]]
- [[concept-llms-txt]]
- [[concept-documented-organization]]
