Amazon Kinesis

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application. With …

Talend Open Studio for Data Integration

Talend Open Studio for Data Integration is free-to-download software to kickstart your first data integration and ETL projects. Features Free open source Apache licenseRDBMS connectors: Oracle, Teradata, Microsoft SQL serverSaaS connectors: Marketo, Salesforce, NetSuitePackaged apps: SAP, Microsoft Dynamics, Sugar CRM Official website Link Tutorial and documentation Click here to view

Spark

Apache Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph …

Snakemake

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, …

SETL

SETL (pronounced “settle”) is a Scala ETL framework powered by Apache Spark that helps you structure your Spark ETL projects, modularize your data transformation logic and speed up your development. Features With SETL, an ETL application could be represented by a Pipeline. A Pipeline contains multiple Stages. In each stage, we could find one or …

Prefect Core

The prefect Python library includes everything you need to design, build, test, and run powerful data applications. Instantly upgrade your existing code with workflow best practices, and use the Prefect UI to orchestrate and monitor everything. Features A proper automation framework has three critical components: Workflow definitionWorkflow engineWorkflow statePrefect Core includes all three, and the …

PipelineX

PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more Features HatchDict: Python in YAML/JSON6Flex-Kedro: Kedro plugin for flexible configMLflow-on-Kedro: Kedro plugin for MLflow usersKedro-Extras: Kedro plugin to use various Python packages Official website Link Tutorial and documentation Click here to view

Oozie

Oozie v3 is a server based Bundle Engine that provides a higher-level oozie abstraction that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun a set coordinator jobs in the bundle level resulting a better and easy operational control. Oozie v2 is a server based Coordinator Engine specialized in running …

Neuraxle

Neuraxle is a Machine Learning (ML) library for building machine learning pipelines. Features Component-Based: Build encapsulated steps, then compose them to build complex pipelines.Evolving State: Each pipeline step can fit, and evolve through the learning processHyperparameter Tuning: Optimize your pipelines using AutoML, where each pipeline step has their own hyperparameter space.Compatible: Use your favorite machine …

Metaflow

Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost the productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning. Features Model with your favorite toolsBuild with MetaflowPowered by …

Enter your contact information to continue reading