Data Versioning, Data Pipelines, and Data Lineage
At Pachyderm, we're building an open-source enterprise-grade data science platform that lets you deploy and manage multi-stage, language-agnostic data pipelines while maintaining complete reproducibility and provenance. If you want to learn more about our grand vision, read what has become our "manifesto." Our system, developed with open source roots, shifts the paradigm of data science workflows by providing reproducibility, data provenance, and opportunity for true collaboration. Pachyderm utilizes modern technologies like Docker and Kubernetes to build an entirely new method of analyzing data. Offered both as an in-house solution as well as hosted-service, Pachyderm brings together version-control for data with the tools to build scalable end-to-end ML/AI pipelines while empowering users to use any language, framework, or tool they want.