Kedro

GitHub Support CommunityData Processing

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning.

Features

Project Template: A standard, modifiable and easy-to-use project template based on Cookiecutter Data Science.
Data Catalog: A series of lightweight data connectors used to save and load data across many different file formats and file systems, including local and network file systems, cloud object stores, and HDFS. The Data Catalog also includes data and model versioning for file-based systems.
Pipeline Abstraction: Automatic resolution of dependencies between pure Python functions and data pipeline visualisation using Kedro-Viz.
Coding Standards: Test-driven development using pytest, produce well-documented code using Sphinx, create linted code with support for flake8, isort and black and make use of the standard Python logging library.
Flexible Deployment: Deployment strategies that include single or distributed-machine deployment as well as additional support for deploying on Argo, Prefect, Kubeflow, AWS Batch and Databricks.

Official website

Link

Tutorial and documentation

Click here to view

Montreal

1275 Av. des Canadiens-de-Montréal,

Montréal, QC H3B 0G4

Canada

Los Angeles

312 Arizona Ave,

Santa Monica, CA 90401,

USA

Dubai

Gate Avenue Zone D at DIFC – Sheikh Zayed Road

Dubai, United Arab Emirates

Doha

1 Al Corniche St, Burj Doha, level 21,

Doha, Qatar