Simple app to test out spark streaming from Kafka.
It's assumed that both docker
and docker-compose
are already installed on your machine to run this poc. Java
, python3
, Spark
, and kafkacat
(optional but recommended) will also be used. Anything that needs to be installed is most likely going to be easiest when using Homebrew (such as kafkacat)
Jake Mason: Creating the model code.
wurstmeister: For his Kafka Docker setup at his repo.
Kafka docker image
Run Kafka using docker
Kafka 0.10.0 example producer
Kafkacat git repo
Kafkacat confluence
Spark streaming + Kafka integration guide
Kafka-python
After cloning this repo clone the repo below to get some Kafka docker-compose files:
cd simple-pyspark-streaming-example;
git clone https://github.com/wurstmeister/kafka-docker.git
In the file kafka-docker/docker-compose-single-broker.yml
change the KAFKA_ADVERTISED_HOST_NAME
environment variable to use localhost
.
Start a single node cluster with broker at localhost:9092.
docker-compose -f kafka-docker/docker-compose-single-broker.yml up -d
To verify the cluster was created successfully you can use a program like kafkacat
to consume and produce to a topic.
In a new terminal use kafkacat
to connect a consumer to the broker with topic test
.
kafkacat -b localhost:9092 -C -t test
Add -d broker
for debugging:
kafkacat -d broker -b localhost:9092 -C -t test
In another new terminal use kafkacat
to connect a producer to the broker with topic test
.
kafkacat -b localhost:9092 -P -t test
Type a message into the terminal and press enter to see the message consumed by the kafkacat consumer client.
TODO