Skip to content

Latest commit

 

History

History
102 lines (68 loc) · 7.2 KB

README.md

File metadata and controls

102 lines (68 loc) · 7.2 KB

Kinesis Data Analytics Lab

map-user map-user

Processing real-time data via. Kinesis Data Analytics - Apache Flink

Youtube video(s)

  1. Send Data to Kinesis from a Python Script
  2. Optional - Send Data to Kinesis from a KDA Notebook
  3. Create a Kinesis Data Analytics Studio and Upload a Notebook
  4. Running the Interactive Flink Zeppelin Notebook
  5. Deploy a Kinesis Data Analytics Studio Notebook

Data Producer

Note if you want to get started and do not want to set up a Kinesis Data Stream & load data into the stream / set up a data simulator, use the sql_1.13_DataGen.zpln notebook. This Zeppelin notebook uses the Flink DataGen connector to generate data with in the Zeppelin notebook without needing a connnection to Kineis or Kafka.

In order to get started with Apache Flink via. Kinesis Data Analytics (KDA), a Kinesis Data Stream with sample data is required. The kinesis_data_producer folder provides two python scripts that will read the data from the CSV file yellow_tripdata_2020-01.csv in the data folder and stream each line in the file as a JSON record/message to a Kineis Data Stream specified.

Two variations of this python data producer are provided.

The two scripts/programs are very similar. A few differences exist depending on if you want run the producer application(s) from your local computer/laptop or if you want to use Cloud9.

For a step by step walk through view the Youtube video Send Data to Kinesis from a Python Script

An alternative method to send sample data to a Kinesis Data Stream - without the need to set up the python data producer(s) described above - is to use the Nyc_Taxi_Produce_KDA_Zeppelin_Notebook.zpln notebook in KDA Studio. This notebook can be uploaded and has instructions to sends sample data from S3 to a Kinesis Data Stream.

To benefit the most from the sample Flink code / labs provided it will be important that you can easily start and stop a python data producer.

Interactive KDA Flink Zeppelin Notebook(s)

The interactive_KDA_flink_zeppelin_notebook folder provides Zeppelin notebooks that are design to work with Kinesis Data Analytics Studio. Deploy a Kinesis Data Analytics Studio instance and upload the Zeppelin (.zpln) notebook(s).

Note - with in the the interactive_KDA_flink_zeppelin_notebook folder are subfolders

Depending on which version of Flink your notebook is configured to use. I would recommend using Flink v1.13.

To upload the notebook

upload_notebook

Once uploaded and opended in Zeppelin. Run the notebook one cell at a time

interactive_notebook

For a step by step walk through of the notebook running view the Youtube video Running the Interactive Flink Zeppelin Notebook

Deployable KDA Flink Zeppelin Notebook(s)

Kinesis Data Analytics Studio provides an excellent development environment. When you are ready to deploy you application Kinesis Data Analytics Studio has a mechanism to build and deploy your notebook code as a long running Kinesis Data Analytics application.

To deploy your notebook

Ensure that when you created your notebook environment you configured the Deploy as application configuration - optional setting with a valid S3 bucket.

deploy_config

To access this configuration menu during the creation of your studio notebook select Create with custom settings instead of the default Quick create with sample code. Follow the set up prompts and on Step 3 - Configure select an S3 bucket for the Deploy as application configuration - optional

With this configured your Zeppelin notebook select Build deployable and export to Amazon S3

build_action

Once the build is complete. Select Deploy deployable as Kinesis Analytics application

deploy_action

When the deployment is complete you will see the application under the analytics application section of Kinesis Data Analytics

deployed

Future Improvements Planned for this Repository