This project collects daily temperature data for Los Angeles from the Open-Meteo API for April through May 2024 and ingests it into an AWS data pipeline. The data is processed and stored in AWS S3, transformed and cleaned using AWS Glue, and made available for querying in AWS Athena. A Grafana dashboard is created to visualize the data, providing insights.
- Prerequisites
- Architecture
- Data Flow
- Setup
- AWS Services Used
- Steps
- Visualization
- Troubleshooting
- Acknowledgements
- AWS account: Sign up for AWS
- Grafana Lab account: Sign up for Grafana
- Data Ingestion: A Lambda function ingests weather data from the Open-Meteo API and sends it to a Kinesis Data Firehose stream.
- Data Storage: Kinesis Data Firehose delivers the data to an S3 bucket.
- Data Crawling: AWS Glue crawls the data in S3 to create a table in the AWS Glue Data Catalog.
- Data Transformation: AWS Glue jobs transform the data, perform data quality checks, and save the cleaned data as Parquet files in S3.
- Data Querying: The transformed data is available for querying in AWS Athena.
- Data Visualization: Grafana is used to build a dashboard for visualizing the data.
- AWS Lambda: To run the function that ingests data from the Open-Meteo API.
- AWS Kinesis Data Firehose: To deliver the ingested data to S3.
- AWS S3: To store raw and transformed data.
- AWS Glue: To crawl, transform, and clean the data.
- AWS Athena: To query the transformed data.
- Grafana: To visualize the data.
- AWS Lambda: Deploy the
LA_weather_lambda_put_record_batch.py
Lambda function in thelambda/
directory using the AWS Lambda Console or CLI. - AWS Kinesis Data Firehose: Create a Kinesis Data Firehose delivery stream to deliver data to your S3 bucket.
- Example configuration:
- Source: Direct PUT or other sources
- Destination: S3 bucket
- S3 bucket ARN:
arn:aws:s3:::your-bucket-name
- Example configuration:
- AWS Glue:
- Create a Glue Crawler to crawl the data in your S3 bucket and create a Glue Data Catalog table.
- Create and run Glue jobs using the scripts in the
glue/
directory to transform data and perform data quality checks.
- AWS Athena: Configure Athena to query the data stored in your S3 bucket.
- Grafana: Set up Grafana to visualize the data.
- Trigger the Lambda function to start data ingestion.
- Verify that the data is being delivered to your S3 bucket via Kinesis Data Firehose.
- Run the Glue crawler to update the Glue Data Catalog.
- Execute Glue jobs to transform and clean the data.
- Query the transformed data in Athena to verify the data quality and structure.
- Use Grafana to visualize the data.

-
Lambda Function Errors:
- Check CloudWatch logs for detailed error messages.
- Verify the IAM role has the necessary permissions.
-
Kinesis Data Firehose Issues:
- Ensure the Firehose stream is properly configured with the correct S3 bucket.
- Check Firehose monitoring metrics for delivery failures.
-
Glue Job Failures:
- Review Glue job logs for errors.
- Ensure the Glue job script paths and S3 bucket permissions are correct.
-
Athena Query Problems:
- Verify the Glue Data Catalog table is correctly configured.
- Check for syntax errors in your SQL queries.
Special thanks to: David Freitag for his course on Maven: Build Your First Serverless Data Engineering Project
Data source: Weather Data Open Meteo API