triviadata / streaming-playground

Overview

Test stateful streaming, and joining streams in various technologies.

Use case

We have 2 streams:

user_info is a stream with relatively high velocity, contains generated events for users (UserInfo(userId: String, timestamp: Long, booleanFlag: Boolean, subCategory: String, someValue: Float, intValue: Int))
user_update is a stream with relatively low velocity, and it contains some additional information about users (UserCategoryUpdate(userId: String, category: String, timestamp: Long))

Our goal is to join those 2 streams, and create an shared state where we will store the current category for each users, and produce a new stream with the enriched user profile.

Currently supporting

spark 1.6.2
flink 1.3.1
kafka-streams 3.1.2
ksql 3.3.0

Todo:

spark 2.x
storm
heron
apex
samza

Requirements

Confluent platform 3.1.2 (It should work with newer versions, but it is not tested)

How to run

follow http://docs.confluent.io/3.1.2/quickstart.html and start
- zookeeper
- kafka
- schema-registry
create topics
- user_info
- user_update
- enriched_user
run the generator
- sbt generator/run
run one of the streaming platforms
- for spark_1_6
  - sbt spark16/run
- for flink
  - download flink 1.3.1
  - run ${flink_directory}/bin/start-local.sh
  - go to project root directory
  - build fat jar using sbt assembly
  - run ${flink_directory}/bin/flink run flink/target/scala-2.11/flink-assembly-0.1.0-SNAPSHOT.jar
- for kafka-streams
  - sbt kafkaStreams/run
- for ksql
  - check out the guide at kafka_streams/README.md

Testcase 1: Joining Streams

	Smoke test¹⁾ (seconds)	Pregenerated Data²⁾ (seconds)	Using Data Generator³⁾ (seconds)
Spark Streaming (1.6)	10.759
Flink (1.3)	89.988
Kafka Streams (3.3.0)	32.207

¹⁾ Processing 11.6 mil of pregenerated messages from Kafka topic

²⁾ Processing XY mil of pregenerated messages from Kafka topic - same as smoke test but

³⁾ Processing XY mil of generated messages. Messages are generated on the fly.

triviadata / streaming-playground

README.md

Overview

Use case

Currently supporting

Todo:

Requirements

How to run

Testcase 1: Joining Streams

About

Releases

Packages

Contributors 2

Languages

triviadata / streaming-playground

Join GitHub today

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio

Latest commit

Git stats

Files

README.md

Overview

Use case

Currently supporting

Todo:

Requirements

How to run

Testcase 1: Joining Streams

About

Topics

Resources

Releases

Packages 0

Contributors 2

Languages

Packages