Basin

[wtm_mlop_cats] Extract, transform, load using visual programming that can run Spark jobs on any environment Create and debug from your browser and export into pure python code! Features Up and running as simple as docker pull Create complex pipelines and flows using drag and drop Debug and preview step by step Integrated dataview grid viewer […]

Azkaban

[wtm_mlop_cats] Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs. Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows. Features Compatible with any version of HadoopEasy to use web UISimple web and http workflow uploadsProject workspacesScheduling of […]

Argo Workflows

[wtm_mlop_cats] Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition). Features Workflow: a Kubernetes resource defining the execution of one or more template. Workflows are named.Template: a step, steps or dag.Step: a single step of a workflow, typically […]

Apache Nifi 

[wtm_mlop_cats] Put simply, NiFi was built to automate the flow of data between systems. While the term ‘dataflow’ is used in a variety of contexts, we use it here to mean the automated and managed flow of information between systems. This problem space has been around ever since enterprises had more than one system, where […]

Airflow

[wtm_mlop_cats] Airflow is a platform that lets you build and run workflows. A workflow is represented as a DAG (a Directed Acyclic Graph), and contains individual pieces of work called Tasks, arranged with dependencies and data flows taken into account. Features 1. Pure Python2. Useful UI3. Robust Integrations4. Easy to Use5. Open Source Official website […]

Enter your contact information to continue reading