Luigi

BOOK A MEETING FR The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures will happen. These tasks can be anything, but are typically long running things like Hadoop jobs, dumping data to/from databases, running machine learning algorithms, or …

Kedro

BOOK A MEETING FR Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning. Features Project Template: A standard, modifiable and easy-to-use project template based on Cookiecutter Data Science. …

Informatica Power Center

BOOK A MEETING FR PowerCenter is a scalable, high-performance foundation for on-premises data integration initiatives, including analytics, data warehousing, and app migration. Features Universal connectivity Role-based tools and agile processes Scalability and zero downtime Advanced data transformation Reusability and automation Rapid prototyping and profiling Official website Link Tutorial and documentation Click here to view See …

Hadoop

BOOK A MEETING FR The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to …

Gokart 

BOOK A MEETING FR Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline. Gokart is a wrapper of the data pipeline library luigi. Gokart solves “reproducibility”, “task dependencies”, “constraints of good code”, and “ease of use” for Machine Learning Pipeline. Features The following data for each Task …

Genie

BOOK A MEETING FR GenieAnalytics provides deep and powerful big-data analytic ability that delivers immediate operational insights for your business. Through multi-dimensional traffic analysis reports and rich visualizations, users are able to gain total control over their network infrastructure and plan what’s right for their business operations accordingly. Based on state-of-the-art big data technology, GenieAnalytics …

Flyte

BOOK A MEETING FR The Workflow Automation Platform for Complex, Mission-Critical Data and ML Processes at Scale Features Executing distributed data pipelines/workflows Reusing tasks across projects, users, and workflows Making it easy to stitch together workflows from different teams and domain experts Backtracing to a specified workflow Comparing results of training workflows over time and …

Dagster

BOOK A MEETING FR Dagster is a data orchestrator. It lets you define pipelines (DAGs) in terms of the data flow between logical components called solids. These pipelines can be developed locally and run anywhere. Features Dagster models data dependencies between steps in your orchestration graph and handles passing data between them. Optional typing on …

Couler 

BOOK A MEETING FR Couler aims to provide a unified interface for constructing and managing workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow. Couler is included in CNCF Cloud Native Landscape and LF AI Landscape. Features Simplicity: Unified interface and imperative programming style for defining workflows with automatic construction …

Bonobo 

BOOK A MEETING FR Bonobo is a lightweight Extract-Transform-Load (ETL) framework for Python 3.5+. It provides tools for building data transformation pipelines, using plain python primitives, and executing them in parallel. Features Bonobo aims at being minimalistic, yet featureful. All basic formats and operations are included within the main library. Optional dependencies are bundled as …

Enter your contact information to continue reading