---
id: "concept-tool-call-offloading"
type: "concept"
source_timestamps: ["§ Battling Context Rot"]
tags: ["context-management", "tool-use"]
related: ["concept-context-rot", "concept-filesystem-primitive", "action-tool-call-offloading"]
definition: "A context management technique that retains only the head and tail tokens of large tool outputs in context, saving the full output to the filesystem."
---
# Tool Call Offloading

## The Problem

Large tool outputs (long command stdout, full file dumps, paginated API responses) can **noisily clutter the context window** without providing useful information to the model. This accelerates [[concept-context-rot|context rot]] and wastes precious tokens.

## The Technique

**Tool call offloading** is a harness middleware pattern:

- Above a configured size threshold, only the **head and tail tokens** of the output are retained in the active context window.
- The **full output is written to the filesystem** ([[concept-filesystem-primitive]]).
- The model is given a reference (path / handle) so it can read the complete data on demand if needed.

This ensures the model can still access the complete data without sacrificing immediate reasoning space.

## Operational Recommendation

The action item [[action-tool-call-offloading]] codifies the practice. Pair it with [[concept-compaction]] and [[concept-progressive-disclosure]] for a complete anti-rot stack.