---
id: "entity-turboquant"
type: "entity"
entityType: "other"
canonicalName: "Turboquant (Google Research paper, ICLR 2026)"
aliases: ["TurboQuant"]
source_timestamps: ["00:00:00"]
tags: ["publication", "algorithm"]
related: ["concept-turboquant", "entity-google"]
canonicalUrl: "https://arxiv.org/ (search 'TurboQuant Google ICLR 2026')"
sources: ["s49-killed-ram-limits"]
sourceVaultSlug: "s49-killed-ram-limits"
originDay: 49
---
# Turboquant (paper)

Turboquant is the **research paper** published by [[entity-google-d49]] (Google Research, ICLR 2026) detailing a novel, lossless compression algorithm for LLM KV caches.

This entity note refers to the **publication itself**. The algorithm described in the paper is documented as the concept [[concept-turboquant]], and the two-step methodology is captured in [[framework-turboquant-process]].

**Key results in the paper**:
- 6x memory reduction, 8x speedup, lossless
- Effective bit precisions as low as 2.5 bits via outlier channel allocation
- Validated on QA, code generation, and 100k-token needle-in-a-haystack retrieval

**Search reference**: 'TurboQuant Google ICLR 2026' on arXiv.

**Pending**: real-world integration via open-source toolchains (vLLM, TensorRT-LLM, etc.).
