This project, titled "Snowflake Internal Stage Data Pipeline," focuses on developing a robust data ingestion pipeline from a MySQL database to Snowflake utilizing Snowflake's Internal Stage and orchestrated with Airflow. The pipeline is designed to perform incremental data loads, enhancing efficiency and reducing data transfer volumes. A metadata-driven approach is employed to streamline and automate the data ingestion process.
- Incremental Data Loading: Efficiently transfers only new or updated records from MySQL to Snowflake, minimizing data transfer and processing time.
- Metadata-Driven Pipeline: Leverages metadata to dynamically configure and execute data ingestion tasks, reducing the need for hard-coded configurations.
- Airflow Orchestration: Utilizes Apache Airflow to manage workflow orchestration, scheduling, and monitoring, ensuring reliable execution of data loading processes.
- Snowflake Integration: Employs Snowflake's Internal Stage for secure and scalable data staging before ingestion into the target tables.
- MySQL: Source database for extracting data.
- Snowflake: Target cloud data warehouse for analytics.
- Apache Airflow: Workflow orchestration tool to manage the data pipeline.
- Python: The primary programming language for scripting and automation.
![image](https://private-user-images.githubusercontent.com/25386607/312123798-ccfbea03-5f73-41b0-a9ed-5b6eceb2a5c8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDEzNDMsIm5iZiI6MTczOTA0MTA0MywicGF0aCI6Ii8yNTM4NjYwNy8zMTIxMjM3OTgtY2NmYmVhMDMtNWY3My00MWIwLWE5ZWQtNWI2ZWNlYjJhNWM4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDE4NTcyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTNjYTg1ZGNhM2ZlNzBjMzdlZGE2YjNmNmMxZjRkYWI2ODdhYzNmNjE0ZDEzYTQ2NDVkNzMzOGI4NGE3OGFhMjYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.yCRAqTIFkVSyedY4YQFXlqsYCTzHwtr3jlvhzq8L7oo)
There are 2 tables which are getting ingested to Snowflake.
Dedicated Dags are developed for each Table
- amazone_books
- amazonebook_reviews
CREATE TABLE amazone_books ( book_id INT NOT NULL AUTO_INCREMENT ,book_title TEXT ,book_amount FLOAT ,book_author TEXT ,book_rating FLOAT ,book_link TEXT ,business_date DATE DEFAULT(CURRENT_DATE) ,PRIMARY KEY (book_id) );CREATE TABLE amazonebook_reviews ( book_id INT NOT NULL ,reviewer_name TEXT ,rating FLOAT ,review_title TEXT ,review_content TEXT ,reviewed_on DATE ,business_date DATE DEFAULT(CURRENT_DATE) );
For the Incremental load. Primary Keys are required in the Tables. Respective Primary key for the Table are
- amazone_books
- book_id
- amazonebook_reviews
- book_id
- reviewer_name
- business_date
Note: This Source Data is from another Project. To know more about how source data is generated please refer AmazonBooks_DataPipeline
The Airflow Dag Ids for respective Tables are
- amazone_books
- Snowflake_InternalStage_amazone_books_Dag
- amazonebook_reviews
- Snowflake_InternalStage_amazonebook_review_Dag
- Workflow Design: A DAG (Directed Acyclic Graph) in Airflow defines the sequence of tasks for data extraction, staging, and loading.
- Incremental Loading: The pipeline identifies new or updated records in MySQL using timestamps or sequence IDs.
- Metadata Management: Metadata definitions (i.e Config) guide the extraction and loading processes, allowing for flexibility and scalability.
- Monitoring and Logging: Airflow provides comprehensive monitoring and logging capabilities, facilitating troubleshooting and performance optimization.
- Data Quality Checks: Implement additional data validation and quality checks within the pipeline.
- Advanced Scheduling: Enhance scheduling capabilities to support more complex dependencies and triggers.
![image](https://private-user-images.githubusercontent.com/25386607/310148365-f14d9492-fdc6-4670-b56b-75303751392e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDEzNDMsIm5iZiI6MTczOTA0MTA0MywicGF0aCI6Ii8yNTM4NjYwNy8zMTAxNDgzNjUtZjE0ZDk0OTItZmRjNi00NjcwLWI1NmItNzUzMDM3NTEzOTJlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDE4NTcyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThmOWM4ZjljZjgxZDUwNWQzNDcyNThjNWZmNGRmZjc1YzdkNWI2ODFjYzIzY2Q0YzViNjkyMmJkNDQxZDIwMzQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.avTrYrWY3X7_tjQyjAUpOk8T0Y8nEWaNh8fumzINErg)
![image](https://private-user-images.githubusercontent.com/25386607/310152368-88cf3b74-3387-4e1e-9309-c550ee13d487.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDEzNDMsIm5iZiI6MTczOTA0MTA0MywicGF0aCI6Ii8yNTM4NjYwNy8zMTAxNTIzNjgtODhjZjNiNzQtMzM4Ny00ZTFlLTkzMDktYzU1MGVlMTNkNDg3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDE4NTcyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTY2YmRmY2E4M2VmYTlkNWZlZDBkM2Y4NDhhNWM5M2ZlZDYzZWQwMTk2Njc5YjVkZmFhYWNmMzAzNWY4Y2E3MjcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.vq6DBUixFQ_O4DbVsypHRBn0Ns9ahbuwcbK631HHEFQ)
![image](https://private-user-images.githubusercontent.com/25386607/310148802-51e9538e-0cd2-4c24-b5a2-45337524460d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDEzNDMsIm5iZiI6MTczOTA0MTA0MywicGF0aCI6Ii8yNTM4NjYwNy8zMTAxNDg4MDItNTFlOTUzOGUtMGNkMi00YzI0LWI1YTItNDUzMzc1MjQ0NjBkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDE4NTcyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWUzZmY3Zjc2OTdlNTE2YjAwMjBhNTAwNmQyN2Q4NTZiODFjZTBiMDRhMGM2NjM4ZDNiN2ZlZmZjZDBiNDY0NzgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.b45gelVtlqJeoexkWpoTfhW8hUrRpO4bJ96dsY3GoB0)
![image](https://private-user-images.githubusercontent.com/25386607/310149245-241fe857-f986-487b-98f1-e9a378b5b426.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDEzNDMsIm5iZiI6MTczOTA0MTA0MywicGF0aCI6Ii8yNTM4NjYwNy8zMTAxNDkyNDUtMjQxZmU4NTctZjk4Ni00ODdiLTk4ZjEtZTlhMzc4YjViNDI2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDE4NTcyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0OGI5OTY3NWU0OTUwYWUwZmY1MjBjZmQxOWUyZTNmNjdiYjFmZGQ5MWNlOTQ4YWEwNmI3Yjk4YjkxNDExMDUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.SpqS1g9O4Fkhv-8JLfTsfoUwOkN4_lXwWU1tDkZe80U)
![image](https://private-user-images.githubusercontent.com/25386607/310157843-71688582-4fc7-47ce-b646-c25e0c373463.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDEzNDMsIm5iZiI6MTczOTA0MTA0MywicGF0aCI6Ii8yNTM4NjYwNy8zMTAxNTc4NDMtNzE2ODg1ODItNGZjNy00N2NlLWI2NDYtYzI1ZTBjMzczNDYzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDE4NTcyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRiNTc5NTY5NWVmYmUyZDI2ZGM1YjgzNjI2Nzk0Yzg3NjQ3MjFhZjcwOTkwNDhiMjUwZGUzYzQxYzYzYmEzODUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Ihm1T7BjsYHG2NEXHlms24uFnEXormk6-MkND8NgOo0)
![image](https://private-user-images.githubusercontent.com/25386607/310149690-9db6ff6a-e755-4c33-97c2-bf9854b7b8ba.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDEzNDMsIm5iZiI6MTczOTA0MTA0MywicGF0aCI6Ii8yNTM4NjYwNy8zMTAxNDk2OTAtOWRiNmZmNmEtZTc1NS00YzMzLTk3YzItYmY5ODU0YjdiOGJhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDE4NTcyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2NTdiMDg0YmUzYzA5N2ZmZjY0NTM2OWFjOTA4Mjc4YmUwZjEzNGZlMzc4MThiOGViMWE0ZmI5Mzc5ZDYwYjkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0._-MaUeZTbG2KPs1PJs-DTPNHrdH5ZGwkoKgopGeSik4)
![image](https://private-user-images.githubusercontent.com/25386607/310379576-b60ce74f-6f63-45d7-8711-f7356dbdbb2f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDEzNDMsIm5iZiI6MTczOTA0MTA0MywicGF0aCI6Ii8yNTM4NjYwNy8zMTAzNzk1NzYtYjYwY2U3NGYtNmY2My00NWQ3LTg3MTEtZjczNTZkYmRiYjJmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDE4NTcyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQxZWZkYjU2ZWM2ODVkNTBhZTdiMDRhMzgzNmNlYmVjMTdhMDVkMGVhODk1NThjM2ZiMWJkYzc0YzI5N2Y1NGMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.tCS9irYFuhAEWch1Ve7v2H12zNwRqshPd-LTEQ7fNxg)
![image](https://private-user-images.githubusercontent.com/25386607/310380210-a6972e16-6177-4c8e-8e17-edcdd2a92ad1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDEzNDMsIm5iZiI6MTczOTA0MTA0MywicGF0aCI6Ii8yNTM4NjYwNy8zMTAzODAyMTAtYTY5NzJlMTYtNjE3Ny00YzhlLThlMTctZWRjZGQyYTkyYWQxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDE4NTcyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTkzYjdhZDEwYzZhYWFmNWY4ZjRjOGVlM2JkNTExNzhiNmI2Nzg2ZjYyMTU3YmIyM2QyM2FjNTI0YTI1Zjg5MGEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.XPR689y0SWt2QZP3NqDLjTTx3JTRRK58arD_y5OM83E)
![image](https://private-user-images.githubusercontent.com/25386607/310380439-e11cb6ee-9d73-45c6-91d0-1cebc2f91c5d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDEzNDMsIm5iZiI6MTczOTA0MTA0MywicGF0aCI6Ii8yNTM4NjYwNy8zMTAzODA0MzktZTExY2I2ZWUtOWQ3My00NWM2LTkxZDAtMWNlYmMyZjkxYzVkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDE4NTcyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY4YzdmZWI5MjQ0NjA1OTIyNzBkMjRlMGQ0ZWY3NTVmMTIwNTc0OWUxODBhMTU1OTdmYzkyMzI2ZTZjODZhNDkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.qOv89IYF6TZGv2gqUJihqX6nOsbiuWg5ieGV_xjWGfk)
![image](https://private-user-images.githubusercontent.com/25386607/310380781-7c4060f9-1291-4e50-baee-dee350d2b979.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDEzNDMsIm5iZiI6MTczOTA0MTA0MywicGF0aCI6Ii8yNTM4NjYwNy8zMTAzODA3ODEtN2M0MDYwZjktMTI5MS00ZTUwLWJhZWUtZGVlMzUwZDJiOTc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDE4NTcyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU0YzI0MTJmNThkNmU0NDI5MDNiMGY2NjkzMWY5MmI4YmEyMmI0MGQzZGMwNWYzYTg5YTZmOTVmNjI3NGFiMzQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.hIEnUeAlT-4BxaPzPrFvf3avB_Gg0ZFms9tHpl3XNGc)
![image](https://private-user-images.githubusercontent.com/25386607/310381167-d4c6c876-b711-4712-b2b2-22f3b5c38ab3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDEzNDMsIm5iZiI6MTczOTA0MTA0MywicGF0aCI6Ii8yNTM4NjYwNy8zMTAzODExNjctZDRjNmM4NzYtYjcxMS00NzEyLWIyYjItMjJmM2I1YzM4YWIzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDE4NTcyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTk0NzQ3YjBmMDM3NDJkZWZkZWVjNWEwMWNmMjg5ZTlmYmI1NjFhOGIxMTM0MWI2ZjAyZDllNzUwZTg3MTVkNDMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.pvDTsNMG76itVV3PFvuoDf63OcfQ2M6PuexDYptQnpw)
- Incremental Load: Medium
- Internal stage:Snowflake Docs
- Power of Metadata driven ETL Frameworks:Linkedin