llama.cpp

Plain C/C++ inference for LLaMA-class models with broad community backends.

Why it is included

Reference-quality local inference stack powering countless GUIs and servers.

Embedding LLMs into apps, edge devices, and research sandboxes.

Local LLM alternative to OpenAI, Anthropic, and Google cloud APIs when you run models on your hardware.

Ollama · MLC