---
id: "concept-adversarial-prompts"
type: "concept"
source_timestamps: ["§ New Risks Executives Must Address"]
tags: ["threat-vectors", "prompt-engineering", "compliance"]
related: ["concept-data-poisoning", "concept-zero-click-ai-exploits"]
definition: "Malicious inputs crafted to trick AI models into bypassing their safety boundaries, resulting in data leaks or harmful outputs."
source_title: "Research: Conventional Cybersecurity Won't Protect Your AI"
source_url: "https://hbr.org/2026/01/ts-research-conventional-cybersecurity-wont-protect-your-ai"
sources: ["tail2"]
sourceVaultSlug: "hbr-seg-tail2"
originDay: 2
articleStem: "hbr-tail-128-cybersecurity-wont-protect-ai"
sourceUrl: "https://hbr.org/2026/01/ts-research-conventional-cybersecurity-wont-protect-your-ai"
sourceTitle: "Research: Conventional Cybersecurity Won’t Protect Your AI"
---
# Adversarial Prompts

Adversarial prompts are inputs designed to trick AI models into violating their own safety boundaries, guardrails, or operational constraints. They can force a model to leak confidential information — Huang's example is a legal AI revealing sensitive case details — or to generate malicious, harmful outputs. Crucially, the impact is not merely technical: a successful adversarial-prompt attack triggers immediate **reputational damage** and severe **compliance crises** for the enterprise running the compromised model. This is a sibling risk to [[concept-data-poisoning]].

**Enrichment grounding.** The definition matches how *prompt injection / adversarial prompting* is described in contemporary LLM-security literature. [[concept-echoleak|EchoLeak]] is itself a canonical real-world case of *indirect* adversarial prompting — hidden instructions in ingested content that cause the LLM to violate intended constraints — which also connects this concept to [[concept-zero-click-ai-exploits]].