Skip to content
OpenCatalogcurated by FLOSSK

Browse & filter

Filter by platform, license text, maturity, maintenance cadence, and editorial tags like privacy-focused or self-hosted. Search matches names, summaries, tags, and use cases.

7 tools match your filters

High-throughput LLM serving with PagedAttention, continuous batching, and OpenAI-compatible APIs for GPU clusters.

llminferenceservinggpuapi

Structured generation language for fast serving: RadixAttention, constrained decoding, and multi-turn batching for frontier-class workloads.

llminferenceservinggpustructured-output

Distributed compute framework for Python: scale data loading, training, hyperparameter search, and online serving (Ray Serve).

distributedpythonservingtraining

Unified model serving and deployment toolkit: package models as APIs, ship to Kubernetes, and manage runtimes.

servingdeploymentmlopsapi

Alibaba’s high-performance LLM inference engine (CUDA-focused) for production serving of diverse decoder architectures.

llminferenceservinggputaaft-repositories

Flexible, high-performance serving system for TensorFlow (and related) models with versioning, batching, and gRPC/REST.

servingtensorflowinferencegrpctaaft-repositories