Skip to content
OpenCatalogcurated by FLOSSK
AI & Machine Learning

olmOCR

Toolkit for linearizing PDFs for LLM datasets/training

Why it is included

Toolkit for linearizing PDFs for LLM datasets/training - allenai/olmocr

Best for

Users exploring vetted FOSS alternatives in this space (information processing).

Strengths

  • ~17,100 GitHub stars (per upstream list)
  • Open source

Limitations

  • Verify license, platform support, and security posture for your environment.

Good alternatives

Related tools