---
id: "concept-turboquant"
type: "concept"
source_timestamps: ["00:00:00", "01:20:00", "04:05:00", "06:10:00"]
tags: ["algorithm", "compression", "google"]
related: ["concept-kv-cache", "concept-polar-quantization", "concept-qjl", "claim-turboquant-performance", "framework-memory-optimization-landscape", "framework-turboquant-process", "entity-google", "entity-turboquant"]
definition: "A lossless software compression algorithm by Google that reduces LLM KV cache memory usage by 6x and increases speed by 8x using polar coordinates and QJL error correction."
sources: ["s49-killed-ram-limits"]
sourceVaultSlug: "s49-killed-ram-limits"
originDay: 49
---
# Turboquant

Turboquant is a breakthrough compression algorithm published by [[entity-google-d49]] designed to drastically reduce the memory footprint of Large Language Models during inference. It targets the [[concept-kv-cache]], the working memory mechanism that dominates inference cost as context windows grow.

Unlike traditional compression methods that add retrieval overhead, Turboquant achieves up to a **6x reduction in memory usage** and an **8x speedup on-chip** without losing a single bit of data — see [[claim-turboquant-performance]]. The compression takes the KV cache representation from 32 bits down to as few as 3 bits per token (or even 2.5 bits with outlier channel allocation per the paper).

It accomplishes this by abandoning traditional [[concept-vector-quantization]] in favor of a two-step mathematical process — see [[framework-turboquant-process]]:

1. **[[concept-polar-quantization]]** — rotate data into a polar coordinate system so the structure becomes highly predictable.
2. **[[concept-qjl]]** — apply a Quantized Johnson-Lindenstrauss error-checker that uses a single bit to eliminate any residual rounding bias.

Crucially, Turboquant is a **[[concept-data-oblivious-algorithm]]**, meaning it works universally across different datasets and model architectures without bespoke tuning. It is published as the [[entity-turboquant]] paper from Google Research (ICLR 2026).

Turboquant is positioned within a broader landscape of memory-optimization vectors documented in [[framework-memory-optimization-landscape]], and is the most aggressive published example of pure quantization-based compression.

See also: [[quote-turboquant-lossless]] and the strategic implication captured in [[claim-google-compounding-advantage]].


## Related across days
- [[concept-kv-cache]]
- [[concept-sovereign-memory]]
- [[concept-multi-head-latent-attention]]
- [[concept-ai-memory-crisis]]
