Spark Streaming 

The Apache Software Foundation Data Stream Processing

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards.

Features

Fast recovery from failures and stragglers.
Better load balancing and resource usage.
Combining of streaming data with static datasets and interactive queries.
Native integration with advanced processing libraries (SQL, machine learning, graph processing)

Official website

Tutorial and documentation

Enter your contact information to continue reading