GitHub - endrit-b/demo-kafka-spark-app: A Spark Streaming processing application that integrates with Apache Kafka

Kafka-Spark end-to-end Demo App

About the Assignment:

Download the Confluent Platform Quick Start and set it up
Create three topics: users, pageviews and top_pages
Use the Kafka Connect DataGenConnector to produce data into these topics, using the users_extended and pageview quickstarts
Use a stream processing engine other than KSQL to develop a processor that:
- Joins the messages in these two topics on the user id field
- Uses a 1 minute hopping window with 10 second advances to compute the 10 most viewed pages by viewtime for every value of gender
- Once per minute produces a message into the top_pages topic that contains the gender, pageid, sum of view time in the latest window and distinct count of user ids in the latest window

Prerequisites

To be able to run this app, make sure you have installed the following:

Scala v2.11.11
Sbt v1.3.13 (Scala build tool)
Kafka v0.10+
Spark 2.4.x
Use Kafka Connect DataGenConnector to produce data using the users and pageview quickstarts

Also make sure the output top_pages Avro schema looks like below:

  {
    "name": "value_top_pages",
    "namespace": "com.mycorp.mynamespace",
    "type": "record",
    "fields": [
      {
        "default": null,
        "name": "gender",
        "type": ["null", "string"]
      },
      {
        "default": null,
        "name": "pageid",
        "type": ["null", "string"]
      },
      {
        "default": null,
        "name": "viewtime_sum",
        "type": ["null", "long"]
      },
      {
        "default": 0,
        "name": "users_count",
        "type": "long"
      }
    ]
  }

Installation and Usage

Once you install the prerequisites, just go ahead and clone the repository, then run commands below (on the project root):

# Prepare the fat jar for spark-submit
sbt assembly

After that run command:

 spark-submit --class io.endrit.sample.Main target/scala-2.11/demo-kafka-spark-app-assembly-0.0.1.jar

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
project		project
src/main/scala/io/endrit/sample		src/main/scala/io/endrit/sample
target		target
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka-Spark end-to-end Demo App

Prerequisites

Installation and Usage

About

Releases

Packages

Languages

License

endrit-b/demo-kafka-spark-app

Folders and files

Latest commit

History

Repository files navigation

Kafka-Spark end-to-end Demo App

Prerequisites

Installation and Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages