Skip to content
OpenCatalogcurated by FLOSSK
AI & Machine Learning

vLLM

High-throughput LLM serving with PagedAttention, continuous batching, and OpenAI-compatible APIs for GPU clusters.

Why it is included

Production-grade open stack for serving Hugging Face–style models with strong throughput defaults.

Best for

Teams self-hosting chat/completions APIs on NVIDIA (and growing) accelerators.

Strengths

  • PagedAttention
  • OpenAI API shape
  • Broad model support

Limitations

  • GPU-centric; ops complexity at scale

Good alternatives

SGLang · TensorRT-LLM · llama.cpp

Related tools