Browse & filter

Filter by platform, license text, maturity, maintenance cadence, and editorial tags like privacy-focused or self-hosted. Search matches names, summaries, tags, and use cases.

TensorRT-LLM

Also strong

NVIDIA TensorRT–based library for optimized LLM inference on GPUs with multi-GPU and speculative decoding features.

llminferencenvidiatensorrtgpu

AI & Machine Learning

NVIDIA Nemotron 3 (Hub)

Also strong

NVIDIA Nemotron 3 open model checkpoints (dense and MoE) on Hugging Face for reasoning, coding, and agentic workloads at scale.

llmhuggingfacenvidiamoetext-generation

AI & Machine Learning

NVIDIA Triton Inference Server

Top pick

Multi-framework inference server for TensorRT, ONNX, PyTorch, Python backends—dynamic batching, ensembles, and GPU sharing.

servinginferencegpunvidiakubernetes

AI & Machine Learning

Transformer Engine

Also strong

NVIDIA library for FP8/FP4 and fused kernels on Hopper/Ada-class GPUs to accelerate Transformer training and inference.

trainingtransformersfp8nvidiataaft-repositories