MNN
Alibaba’s lightweight inference engine for mobile and edge—used for on-device LLMs and classic CV models with aggressive optimization.
Why it is included
Listed on TAAFT’s machine-learning / LLM repository feeds as a major Apache-2.0 on-device stack.
Best for
Teams shipping neural networks on phones, embedded Linux, and constrained ARM SoCs.
Strengths
- Edge performance
- Small footprint
- LLM paths on device
Limitations
- Ecosystem smaller than ONNX Runtime for generic server inference
Good alternatives
ONNX Runtime · TensorFlow Lite · ExecuTorch
Related tools
AI & Machine Learning
ONNX Runtime
Cross-platform inference accelerator for ONNX models: CPU, GPU, and mobile execution providers with graph optimizations.
AI & Machine Learning
MediaPipe
Google’s cross-platform pipelines for perception: face/hand pose, segmentation, and on-device ML graphs for mobile and desktop.
AI & Machine Learning
MLC LLM
Universal deployment stack compiling models to Vulkan, Metal, CUDA, and WebGPU via TVM/Unity for phones, browsers, and servers.
AI & Machine Learning
rtp-llm
Alibaba’s high-performance LLM inference engine (CUDA-focused) for production serving of diverse decoder architectures.
AI & Machine Learning
KVPress
NVIDIA research-oriented toolkit for LLM KV-cache compression to stretch context within fixed VRAM budgets.
AI & Machine Learning
Ollama
Local LLM runner and model library with simple CLI and API for workstation inference.
