Skip to content
OpenCatalogcurated by FLOSSK
AI & Machine Learning

DVC

Data version control for ML: version datasets and models with Git, cloud storage, and reproducible pipelines.

Why it is included

Fills the gap between Git and terabyte-scale artifacts—essential for serious reproducible ML outside notebook-only workflows.

Best for

Teams sharing data + model lineage across researchers and CI without copying giant blobs into Git.

Strengths

  • Git-native mental model
  • Remote storage
  • Pipelines

Limitations

  • Requires discipline on remote cache layout and access control

Good alternatives

LakeFS · Git LFS alone · MLflow artifacts

Related tools