Kedro

GitHub Support CommunityData Processing

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning.

Features

Project Template: A standard, modifiable and easy-to-use project template based on Cookiecutter Data Science.
Data Catalog: A series of lightweight data connectors used to save and load data across many different file formats and file systems, including local and network file systems, cloud object stores, and HDFS. The Data Catalog also includes data and model versioning for file-based systems.
Pipeline Abstraction: Automatic resolution of dependencies between pure Python functions and data pipeline visualisation using Kedro-Viz.
Coding Standards: Test-driven development using pytest, produce well-documented code using Sphinx, create linted code with support for flake8, isort and black and make use of the standard Python logging library.
Flexible Deployment: Deployment strategies that include single or distributed-machine deployment as well as additional support for deploying on Argo, Prefect, Kubeflow, AWS Batch and Databricks.

Official website

Tutorial and documentation

Enter your contact information to continue reading