faster-whisper
CTranslate2 reimplementation of Whisper for faster CPU/GPU inference with lower memory use than reference PyTorch.
Why it is included
Production default for many self-hosted transcription pipelines using Whisper weights.
Best for
Serving or batch ASR where throughput and RAM matter more than research flexibility.
Strengths
- Speed
- Quantization
- Drop-in style API
Limitations
- Tracks Whisper releases; feature parity nuances per version
Good alternatives
Whisper · Whisper.cpp · mlx-whisper
Related tools
AI & Machine Learning
Whisper
OpenAI’s open-source speech recognition model family with multilingual transcription and translation checkpoints.
AI & Machine Learning
ONNX Runtime
Cross-platform inference accelerator for ONNX models: CPU, GPU, and mobile execution providers with graph optimizations.
AI & Machine Learning
OpenVINO
Intel toolkit to optimize and deploy deep learning on Intel CPUs, GPUs, and NPUs with model conversion and runtime APIs.
AI & Machine Learning
Ollama
Local LLM runner and model library with simple CLI and API for workstation inference.
AI & Machine Learning
llama.cpp
Plain C/C++ inference for LLaMA-class models with broad community backends.
AI & Machine Learning
vLLM
High-throughput LLM serving with PagedAttention, continuous batching, and OpenAI-compatible APIs for GPU clusters.
