K2V2: Optimizing KV Cache Memory Management via Channel-Specific Mixed-Precision Quantization

Published in MLSys 2026, 2025

K2V2 is a novel KV cache quantization strategy that optimizes memory management through channel-specific mixed-precision quantization. The approach combines sink32 and 25% K-channel precision boost to recover accuracy close to FP16 while significantly reducing memory footprint. Evaluation pipelines implemented with vLLM and SGLang demonstrate the effectiveness of this approach for efficient inference.

Download paper here

Recommended citation: Li, J., et al. (2025). "K2V2: Optimizing KV Cache Memory Management via Channel-Specific Mixed-Precision Quantization." MLSys 2026.
Download Paper

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Jisen Li