Universal deployment stack compiling models to Vulkan, Metal, CUDA, and WebGPU via TVM/Unity for phones, browsers, and servers.
Browse & filter
Filter by platform, license text, maturity, maintenance cadence, and editorial tags like privacy-focused or self-hosted. Search matches names, summaries, tags, and use cases.
8 tools match your filters
Google’s smaller open **weights** Gemma line (Gemma 2/3, etc.) with Gemma license terms, plus `gemma.cpp` for lightweight CPU inference.
Small language model family (Phi-3/4 lineage) emphasizing strong quality per parameter; weights on Hugging Face under Microsoft licenses per release.
1.1B-parameter Llama-architecture model trained on ~3T tokens—Apache-2.0 weights for fast experiments and teaching.
Hugging Face TB small LM family (135M–1.7B) with Apache-2.0 weights aimed at on-device and edge quality per size.
Intel toolkit to optimize and deploy deep learning on Intel CPUs, GPUs, and NPUs with model conversion and runtime APIs.
Google’s cross-platform pipelines for perception: face/hand pose, segmentation, and on-device ML graphs for mobile and desktop.
Alibaba’s lightweight inference engine for mobile and edge—used for on-device LLMs and classic CV models with aggressive optimization.
