---
id: "entity-h2o"
type: "entity"
entityType: "organization"
canonicalName: "H2O"
aliases: ["H2O.ai"]
source_timestamps: ["16:49:00"]
tags: ["organization"]
related: ["framework-memory-optimization-landscape"]
canonicalUrl: "https://h2o.ai/"
sources: ["s49-killed-ram-limits"]
sourceVaultSlug: "s49-killed-ram-limits"
originDay: 49
---
# H2O

H2O is mentioned as a 'heavy hitter' in the **eviction and sparsity** approach to memory optimization — bucket #2 of [[framework-memory-optimization-landscape]].

**Approach**: Keep only tokens with the highest attention scores in the [[concept-kv-cache]] and discard the rest. The premise: most context tokens contribute negligibly to any given output, so they can be evicted without meaningfully degrading generation quality.

H2O's approach is complementary to:
- [[concept-turboquant]] (quantization)
- [[concept-multi-head-latent-attention]] (architectural)
- ShadowKV/FlexGen (tiering)
- Flash Attention (memory access optimization)

A production stack can use eviction alongside any of these.

**Canonical URL**: https://h2o.ai/ (likely)
