Browse & filter

Filter by platform, license text, maturity, maintenance cadence, and editorial tags like privacy-focused or self-hosted. Search matches names, summaries, tags, and use cases.

Ollama

Top pick

Local LLM runner and model library with simple CLI and API for workstation inference.

llmlocalinference

AI & Machine Learning

llama.cpp

Top pick

Plain C/C++ inference for LLaMA-class models with broad community backends.

llminferencec++local

AI & Machine Learning

MLX LM

Also strong

Apple MLX-based LLM inference and training on Apple silicon: efficient Metal-backed transformers and examples for local chat models.

llmapple-siliconinferencemetallocal

AI & Machine Learning

llamafile

Honorable mention

Single-file distributable LLM weights + llama.cpp runtime: run large models from one executable with broad OS CPU/GPU support.

llmlocalinferenceportable

AI & Machine Learning

ExLlamaV2

Honorable mention

Memory-efficient CUDA inference kernels for quantized Llama-class models—popular in consumer GPU chat UIs.

llminferencecudaquantizationlocal