GithubData Processing

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline. Gokart is a wrapper of the data pipeline library luigi. Gokart solves “reproducibility”, “task dependencies”, “constraints of good code”, and “ease of use” for Machine Learning Pipeline.


The following data for each Task is stored separately in a pkl file with hash value
* task output data
* imported all module versions
* task processing time
* random seed in task
* displayed log
* all parameters set as class variables in the task

If change parameter of Task, rerun spontaneously.
* The above file will be generated with a different hash value
* The hash value of dependent task will also change and both will be rerun

> Support GCS or S3
> The above output is exchanged between tasks as an intermediate file, which is memory-friendly
> pandas.DataFrame type and column checking during I/O
> Directory structure of saved files is automatically determined from structure of script
> Seeds for numpy and random are automatically fixed
> Can code while adhering to SOLID principles as much as possible
> Tasks are locked via redis even if they run in parallel

Official website

Tutorial and documentation

Enter your contact information to continue reading