A basic Apache Spark Streaming application
The motivation behind this project was to provide support to developers and researchers in using Apache Spark Streaming with Apache Kafka.
1) Ensure Hadoop is setup i.e. ${HADOOP_CONF_DIR} and ${HADOOP_HOME} are set
2)${HADOOP_HOME}/bin/winutils.exe must exist otherwise you will get the error _Failed to locate the winutils binary in the hadoop binary path_
3) Ensure Kafka is setup (
Instructions to run:
1) Zookeeper is required for Kafka - Run an instance of it: ${kafka_dir}/bin/ ${kafka_dir}/config/
2) Start Kafka - In this case we will run 1 node: ${kafka_dir}/bin/ ${kafka_dir}/config/ (to run multiple brokers/nodes, run with unique i.e. unique and log.dirs)
3) Run the spark-kafka-streaming application either through IDE or execute on a new Shell
4) In a new Shell, open the Kafka console producer: ${kafka_dir}/bin/ --broker-list [ip/localhost]:[port-default_is_9092] --topic [topic_name]