Skip to content
OpenCatalogcurated by FLOSSK
AI & Machine Learning

Transformer Engine

NVIDIA library for FP8/FP4 and fused kernels on Hopper/Ada-class GPUs to accelerate Transformer training and inference.

Why it is included

Listed on TAAFT under NVIDIA repositories tagged machine-learning / LLM acceleration.

Best for

Training and serving frontier Transformers where FP8/FP4 kernels unlock throughput.

Strengths

  • FP8/FP4 paths
  • Tight PyTorch/JAX integration options
  • NVIDIA-optimized

Limitations

  • Hardware-specific wins; not portable to all accelerators

Good alternatives

FlashAttention · DeepSpeed · PyTorch AMP alone

Related tools