K2V2: Optimizing KV Cache Memory Management via Channel-Specific Mixed-Precision Quantization
Published in MLSys 2026, 2025
K2V2 is a novel KV cache quantization strategy that optimizes memory management through channel-specific mixed-precision quantization. The approach combines sink32 and 25% K-channel precision boost to recover accuracy close to FP16 while significantly reducing memory footprint. Evaluation pipelines implemented with vLLM and SGLang demonstrate the effectiveness of this approach for efficient inference.
Recommended citation: Li, J., et al. (2025). "K2V2: Optimizing KV Cache Memory Management via Channel-Specific Mixed-Precision Quantization." MLSys 2026.
Download Paper
