Pachyderm

Pachyderm Inc.Machine Learning end-to-end platform

Pachyderm is a tool for version-controlled, automated, end-to-end data pipelines for data science. If you need to chain together data scraping, ingestion, cleaning, munging, wrangling, processing, modeling, and analysis in a sane way, while ensuring the traceability and provenance of your data, Pachyderm is for you. If you have an existing set of scripts which do this in an ad-hoc fashion and you’re looking for a way to “productionize” them, Pachyderm can make this easy for you.

Features

Containerized: Pachyderm is built on Docker and Kubernetes. Whatever languages or libraries your pipeline needs, they can run on Pachyderm which can easily be deployed on any cloud provider or on prem.
Version Control: Pachyderm version controls your data as it’s processed. You can always ask the system how data has changed, see a diff, and, if something doesn’t look right, revert.
Provenance (aka data lineage): Pachyderm tracks where data comes from. Pachyderm keeps track of all the code and data that created a result.
Parallelization: Pachyderm can efficiently schedule massively parallel workloads.
Incremental Processing: Pachyderm understands how your data has changed and is smart enough to only process the new data.

Official website

Link

Tutorial and documentation

Click here to view

Montreal

1275 Av. des Canadiens-de-Montréal,

Montréal, QC H3B 0G4

Canada

Los Angeles

312 Arizona Ave,

Santa Monica, CA 90401,

USA

Dubai

Gate Avenue Zone D at DIFC – Sheikh Zayed Road

Dubai, United Arab Emirates

Doha

1 Al Corniche St, Burj Doha, level 21,

Doha, Qatar