Propose a reliable data pipeline solution to capture high-velocity stream of patient vitals such as body temperature, heartbeat, blood pressure (BP) coming from IoT devices and send an instant email notification incase of abnormal vitals.
The detailed use case regarding project objective and tools to be used are provided in CaptsoneProject.docx
- A script to run/start Kafka server, create a topic(kafka.pdf)
- A script to create producer application to read from RDS and push the message into the topic in below format and list the messages in the topic (kafka_produce_patient_vitals.py)
- A script of Pyspark application to read all messages from Kafka topic into HDFS file in parquet format in_df2.writeStream.format().. (kafka_spark_patient_vitals.py)
- A script to build an external hive table for the threshold data and view threshold data(hive1.pdf)
- A script to create hbase table with 3 families (attr, limit, msg) -- insert 12 records into hbase table(hbase.pdf)
- A script to create an external hive table for patients vital information and view data(hive2.pdf)
- A script to extract patient info using sqoop into hive table(sqoop.pdf)
- A script of Spark streaming application to read data from HDFS compare with hbase (kafka_spark_generate_alerts.py)
- A script of consumer application to send an email (kafka_consume_alerts.py)
- A screenshot of successful SNS configuration(sns.pdf)
- A document comprising your overall code logic. This document should also have the commands needed to run all the scripts mentioned above. (code_logic.pdf)