Also strong
NVIDIA TensorRT–based library for optimized LLM inference on GPUs with multi-GPU and speculative decoding features.
llminferencenvidiatensorrtgpu
Filter by platform, license text, maturity, maintenance cadence, and editorial tags like privacy-focused or self-hosted. Search matches names, summaries, tags, and use cases.
4 tools match your filters
NVIDIA TensorRT–based library for optimized LLM inference on GPUs with multi-GPU and speculative decoding features.
NVIDIA Nemotron 3 open model checkpoints (dense and MoE) on Hugging Face for reasoning, coding, and agentic workloads at scale.
Multi-framework inference server for TensorRT, ONNX, PyTorch, Python backends—dynamic batching, ensembles, and GPU sharing.
NVIDIA library for FP8/FP4 and fused kernels on Hopper/Ada-class GPUs to accelerate Transformer training and inference.