GitHub - elmezianech/Snowflake_dbt_Airflow_ELT: This project is an ELT Pipeline using Dbt (dbt-core) for transformation, Snowflake for data warehousing and Airflow for orchestration.

Project Desciption

This project is an ELT Pipeline using Dbt (dbt-core) for transformation, Snowflake for data warehousing and Airflow for orchestration.

Dataset

This project uses the Snowflake tpch dataset (tpch_sf1 schema).

Project Implementation

Step 1 : Setup dbt + Snowflake

Creates a Snowflake environment with a warehouse, database, and user role for dbt to access and manage data.

Step 2 : Configure dbt_project.yml and packages

Defines a configuration file (dbt_profile.yaml) that specifies how dbt interacts with Snowflake, including the warehouse and schema to use for staging and production data.

Step 3 : Create source and staging tables

Creates source and staging models:

Source models define how to access raw data from Snowflake's sample data (tpch_sf1 schema).
Staging models define SQL queries to transform raw data into a format suitable for further processing.

Step 4 : Macro functions

Creates a reusable function (macro) to calculate discounted amounts based on extended price and discount percentage.

Step 5 : Transformed models (fact tables, data marts)

Creates dbt models for data transformation:

Intermediate tables join data from staging models and perform calculations.
Data marts aggregate and summarize data for specific use cases.
Fact tables integrate data from various sources for analysis.

Step 6 : Generic and singular tests

Creates generic and singular tests for data quality:

Generic tests (defined in YAML) enforce data integrity rules like uniqueness, not null values, and relationships between tables.
Singular tests (written in SQL) identify specific data quality issues like invalid dates or orders with discounts.

Step 7 : Deploy models using Airflow

Configures Airflow to run the dbt pipeline, I used Astronomer Cosmos for deploying the dbt project into airflow:

Sets up Airflow with necessary Python libraries for dbt and Snowflake connection.
Defines a Snowflake connection within the Airflow UI.
Creates a Python script (dbt_dag.py) that configures an Airflow DAG (Directed Acyclic Graph) to run the dbt project at scheduled intervals. This DAG uses the dbt project directory, profile configuration (including Snowflake connection details), and execution settings (including the path to the dbt executable).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.astro		.astro
dags		dags
tests/dags		tests/dags
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
packages.txt		packages.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Desciption

Dataset

Project Implementation

Step 1 : Setup dbt + Snowflake

Step 2 : Configure dbt_project.yml and packages

Step 3 : Create source and staging tables

Step 4 : Macro functions

Step 5 : Transformed models (fact tables, data marts)

Step 6 : Generic and singular tests

Step 7 : Deploy models using Airflow

About

Releases

Packages

Languages

elmezianech/Snowflake_dbt_Airflow_ELT

Folders and files

Latest commit

History

Repository files navigation

Project Desciption

Dataset

Project Implementation

Step 1 : Setup dbt + Snowflake

Step 2 : Configure dbt_project.yml and packages

Step 3 : Create source and staging tables

Step 4 : Macro functions

Step 5 : Transformed models (fact tables, data marts)

Step 6 : Generic and singular tests

Step 7 : Deploy models using Airflow

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages