Skip to content
OpenCatalogcurated by FLOSSK
AI & Machine Learning

KVPress

NVIDIA research-oriented toolkit for LLM KV-cache compression to stretch context within fixed VRAM budgets.

Why it is included

Surfaced on TAAFT’s #llm repository tag as an Apache-2.0 KV-cache compression project.

Best for

Experimenters reducing memory footprint of long-context Transformer inference.

Strengths

  • Focused problem
  • Composable with HF-style stacks

Limitations

  • Research-grade; validate quality loss per method and model

Good alternatives

PagedAttention tuning · Quantized KV · Sliding-window models

Related tools