---
id: "question-measuring-cognitive-friction"
type: "open-question"
source_timestamps: ["§ The Business Consequences of Non-Diversity in Agentic Teams"]
tags: ["metrics", "evaluation"]
related: ["concept-cognitive-friction", "claim-diversity-improves-performance"]
resolutionPath: "Development of new AI evaluation metrics that score multi-agent interactions on creativity, problem-solving speed, and reduction of correlated errors."
sources: ["agentic"]
sourceVaultSlug: "hbr-seg-agentic"
originDay: 6
articleStem: "hbr-new-28-agent-teams-different-models"
sourceUrl: "https://hbr.org/2026/06/the-strongest-teams-of-ai-agents-will-be-built-using-different-models"
sourceTitle: "The Strongest Teams of AI Agents Will Be Built Using Different Models"
---
# How can enterprises quantitatively measure 'cognitive friction' in AI teams?

**Open question.** While the article cites studies showing diverse agent teams perform ~25% better (see [[claim-diversity-improves-performance]]), it provides **no framework** for a standard enterprise to measure whether its *specific* mix of models is generating productive [[concept-cognitive-friction]] versus merely generating conflicting, unusable outputs.

**Resolution path:** Development of new AI evaluation metrics that score multi-agent interactions on **creativity, problem-solving speed, and reduction of correlated errors**.

**Enrichment note:** This gap maps onto active work in agent evaluation — Galileo, IBM, AWS, and ML Mastery describe metrics frameworks (task completion, tool usage, golden datasets, LLM-as-judge, human review), but none yet offer a standardized 'cognitive-friction' score. Structural diversity without rigorous evaluation may add complexity without guaranteed benefit, so this metric gap is consequential.