Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Overview

Test stateful streaming, and joining streams in various technologies.

Use case

We have 2 streams:

  • user_info is a stream with relatively high velocity, contains generated events for users (UserInfo(userId: String, timestamp: Long, booleanFlag: Boolean, subCategory: String, someValue: Float, intValue: Int))

  • user_update is a stream with relatively low velocity, and it contains some additional information about users (UserCategoryUpdate(userId: String, category: String, timestamp: Long))

Our goal is to join those 2 streams, and create an shared state where we will store the current category for each users, and produce a new stream with the enriched user profile.

Currently supporting

  • spark 1.6.2
  • flink 1.3.1
  • kafka-streams 3.1.2
  • ksql 3.3.0

Todo:

  • spark 2.x
  • storm
  • heron
  • apex
  • samza

Requirements

  • Confluent platform 3.1.2 (It should work with newer versions, but it is not tested)

How to run

  • follow http://docs.confluent.io/3.1.2/quickstart.html and start

    • zookeeper
    • kafka
    • schema-registry
  • create topics

    • user_info
    • user_update
    • enriched_user
  • run the generator

    • sbt generator/run
  • run one of the streaming platforms

    • for spark_1_6

      • sbt spark16/run
    • for flink

      • download flink 1.3.1
      • run ${flink_directory}/bin/start-local.sh
      • go to project root directory
      • build fat jar using sbt assembly
      • run ${flink_directory}/bin/flink run flink/target/scala-2.11/flink-assembly-0.1.0-SNAPSHOT.jar
    • for kafka-streams

      • sbt kafkaStreams/run
    • for ksql

      • check out the guide at kafka_streams/README.md

Testcase 1: Joining Streams

Smoke test1) (seconds) Pregenerated Data2) (seconds) Using Data Generator3) (seconds)
Spark Streaming (1.6) 10.759
Flink (1.3) 89.988
Kafka Streams (3.3.0) 32.207

1) Processing 11.6 mil of pregenerated messages from Kafka topic

2) Processing XY mil of pregenerated messages from Kafka topic - same as smoke test but

3) Processing XY mil of generated messages. Messages are generated on the fly.

You can’t perform that action at this time.