vLLM

Name: vLLM
Availability: InStock

High-throughput LLM serving with PagedAttention, continuous batching, and OpenAI-compatible APIs for GPU clusters.

Why it is included

Production-grade open stack for serving Hugging Face–style models with strong throughput defaults.

Teams self-hosting chat/completions APIs on NVIDIA (and growing) accelerators.

SGLang · TensorRT-LLM · llama.cpp