Spark Streaming 

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can …

Kafka Streams

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. Features Highly scalable, elastic, distributed, and fault-tolerant application.Stateful and stateless processing.Event-time processing with windowing, joins, and aggregations. Official website Link Tutorial and documentation Click here to view

IBM Stream Analytics

IBM® Streaming Analytics for IBM Cloud is powered by IBM® Streams, an advanced analytic platform that you can use to ingest, analyze, and correlate information as it arrives from different types of data sources in real time. When you create an instance of the Streaming Analytics service, you get your own instance of IBM® Streams …

Google Cloud DataFlow

Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem Features Autoscaling of resources and dynamic work rebalancingFlexible scheduling and pricing for batch processingReady-to-use real-time AI patterns Official website Link Tutorial and documentation Click here to view


Faust is a stream processing library, porting the ideas from Kafka Streams to Python. It is used at Robinhood to build high performance distributed systems and real-time data pipelines that process billions of events every day. Features SimpleHighly AvailableDistributedFastFlexible Official website Link Tutorial and documentation Click here to view


Brooklin is a distributed system intended for streaming data between various heterogeneous source and destination systems with high reliability and throughput at scale. Designed for multitenancy, Brooklin can simultaneously power hundreds of data pipelines across different systems and can easily be extended to support new sources and destinations. Features Extensible for any source and destinationScalableEasy …

Azure Stream Analytics

Azure Stream Analytics is a real-time analytics and complex event-processing engine that is designed to analyze and process high volumes of fast streaming data from multiple sources simultaneously. Patterns and relationships can be identified in information extracted from a number of input sources including devices, sensors, clickstreams, social media feeds, and applications. Features You can …

Apache Samza

Apache Samza is a scalable data processing engine that allows you to process and analyze your data in real-time. Features High performance. Samza provides extremely low latencies and high throughput to analyze your data instantly.Horizontally scalableEasy to OperatePowerful APIsWrite once, Run AnywherePluggable architecture Official website Link Tutorial and documentation Click here to view

Apache Flink

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Features It has a streaming processor, which can run both batch and stream programs.It can process data …

Enter your contact information to continue reading