Gokart

GithubData Processing

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline. Gokart is a wrapper of the data pipeline library luigi. Gokart solves “reproducibility”, “task dependencies”, “constraints of good code”, and “ease of use” for Machine Learning Pipeline.

Features

The following data for each Task is stored separately in a pkl file with hash value
* task output data
* imported all module versions
* task processing time
* random seed in task
* displayed log
* all parameters set as class variables in the task

If change parameter of Task, rerun spontaneously.
* The above file will be generated with a different hash value
* The hash value of dependent task will also change and both will be rerun

> Support GCS or S3
> The above output is exchanged between tasks as an intermediate file, which is memory-friendly
> pandas.DataFrame type and column checking during I/O
> Directory structure of saved files is automatically determined from structure of script
> Seeds for numpy and random are automatically fixed
> Can code while adhering to SOLID principles as much as possible
> Tasks are locked via redis even if they run in parallel

Official website

Link

Tutorial and documentation

Click here to view

Montreal

1275 Av. des Canadiens-de-Montréal,

Montréal, QC H3B 0G4

Canada

Los Angeles

312 Arizona Ave,

Santa Monica, CA 90401,

USA

Dubai

Gate Avenue Zone D at DIFC – Sheikh Zayed Road

Dubai, United Arab Emirates

Doha

1 Al Corniche St, Burj Doha, level 21,

Doha, Qatar