Alibaba’s lightweight inference engine for mobile and edge—used for on-device LLMs and classic CV models with aggressive optimization.
Browse & filter
Filter by platform, license text, maturity, maintenance cadence, and editorial tags like privacy-focused or self-hosted. Search matches names, summaries, tags, and use cases.
22 tools match your filters
Alibaba’s high-performance LLM inference engine (CUDA-focused) for production serving of diverse decoder architectures.
Physics-ML / scientific deep learning framework: neural operators, PINNs, and domain-parallel training on GPUs.
NVIDIA library for FP8/FP4 and fused kernels on Hopper/Ada-class GPUs to accelerate Transformer training and inference.
NVIDIA research-oriented toolkit for LLM KV-cache compression to stretch context within fixed VRAM budgets.
Flexible, high-performance serving system for TensorFlow (and related) models with versioning, batching, and gRPC/REST.
Retargetable MLIR-based compiler and runtime to lower ML graphs to CPUs, GPUs, and accelerators from multiple frontends.
AutoTrain Advanced: low-code training flows for classification, LLM fine-tunes, and diffusion tasks tied to the Hub.
Official Python client for the Hugging Face Hub: upload/download models, datasets, and manage tokens and repos.
TypeScript/JavaScript libraries to call Inference API, manage Hub assets, and build browser or Node AI features.
Rust-based high-throughput server for sentence-transformers–class embedding models with GPU/CPU backends.
Open-source Svelte/TypeScript app that powers HuggingChat—multi-model chat, tools, and self-hostable UI patterns.
Curated recipes and code for aligning language models (preference optimization, DPO-style flows) on open stacks.
Rust LSP server that plugs LLM-backed completions into editors—designed to pair with local or API models.
Contrastive vision–language pretraining reference implementation: map images and text to a shared embedding space.
Google Research pretrained time-series foundation model for forecasting with open Apache-2.0 code and checkpoints.
Google library to extract structured fields from unstructured text with LLMs, source grounding, and visualization helpers.
ByteDance open agent harness for long-horizon research, coding, and creation with tools, memory, and subagents.
OpenAI’s MIT-licensed Python kit for multi-agent workflows, handoffs, guardrails, and tracing with the Responses API.
DeepSeek Janus series: unified multimodal understanding and generation models with MIT-licensed research code.
Open-source TypeScript ‘AI coworker’ framework with memory, tool use, and agent workflows for product integration.
Apple’s Python utilities to convert, compress, and validate models for Core ML deployment on Apple devices.
