Dbt_With_SnowFlake

Overview

The purpose of doing this was to learn how the Tranformations step can be made easy using data build tool (dbt).
Check what different functionalities does dbt support that can save development time.
Snowflake was used as a warehouse. Free 30 day trail when you sign up.

Data Build Tool

dbt enables analytics engineers to transform data in their warehouses by simply writing select statements.
dbt handles turning these select statements into tables and views. These tables can be incremental and dbt will handle that.
dbt does the T in ELT (Extract, Load, Transform) processes.
dbt provides a web UI that can be used to visualize the lineage and dependency of models.
dbt web UI also provides detail about each model and what it depends on as well the as test cases on certain models.

dbt Installation

Install git and python.
dbt installation on Linux has some problem and due to that we need install the dbt-core project.
Will be doing this all in a virtual environment.
This will create a virtual env

  python3 -m venv dbt-env

Activate the env
- Remember to be in the correct folder.

  source dbt-env/bin/activate

In this repo dbt-core has already been download but you can clone it as well.

  git clone https://github.com/dbt-labs/dbt.git

Go into the dbt folder and install dbt.
In the requirements.txt you can specify what to download e.g. snowflake, big-query.

  cd dbt
  pip install -r requirements.txt

Command to verify dbt installed.

  dbt --version

Download the Snowflake Plugin if not specified in the requirements.txt file

  pip install dbt-snowflake

Open your choice of editor, VS Code used in this demo. Running the command will populate with different folders (seeds, models etc)

  dbt init <project-name>

Connecting dbt with Snowflake

Snowflake provides 30 day free trail.
When dbt installed and configured, in home directory a .dbt will be visible.
Two file will be present.
- profile.yml
- .user.yml
In the profile.yml we provide our Snowflake credentials. Refer to dbt documentation.
https://docs.getdbt.com/reference/warehouse-profiles/snowflake-profile

dbt_model:
  outputs:
    dev:
      account: ap12345.ap-south-1.aws
      database: <database>
      password: <password>
      role: <role-from-snowflake>
      schema: <schema>
      threads: 2
      type: snowflake
      user: <username>
      warehouse: <warehouse-name-from-snowflake>
  target: dev

When snowflake profile has been set, run the command to check the connection.

  dbt debug

Folder Structure

dbt - dbt cloned repo used for installation
dbt-evn - python virtual env related
dbt-model
- dbt-model - after dbt init this is created
  - analyses
  - macros - create macros here and refer later
  - models - tables, views, incremental load, merge
  - seeds - flat files incase want to load to staging tables using dbt
  - snapshots - SCD tables
  - tests - tests on different models
  - dbt_project.yml - one place to configure all
  - packages.yml - dbt has many packages which can be downloaded

Program Flow

Program Execution

Before executing any of the commands remember to be in the correct folder.

  cd <project-name>

To load file from seeds folder to Stage Tables in snowflake.

  dbt seed

The data will be in the Stage Tables, now will load data to Core/Dim tables.
- City, Country, Transations will be loaded as they have no history handling needed.

  dbt run

To run a specific folder inside model folder.

  dbt run -m <folder-name>

The Snapshot folder has all those models on which SCD-2 is being used.

  dbt snapshot

We can also check test cases that are defined on different models, snapshots, seeds.

  dbt test

dbt provides a web UI that can be accessed using.
- Internally it has all metadata in json that is saved and used by the web UI

  dbt docs generate
  dbt docs serve

You can check different things in the UI and also the lineage as well.

Data Model

The source system provides the Full Dump (all of the data every time).
The Transactions table is append only data.
Most the tables in Core/Dim are SCD Type 1 or Type 2.
Country and City do not change much they can be loaded manually whenever needed.

Level Up

Right now just loading data to Core/Dim.
Create a pipeline and orchectrate it using Airflow or Airbyte.
Make Fact Table and visualizations on top of that.
Use dbt test for testing purpose and different packages that can aid in development.

Documentation and Material

dbt documentation
dbt profile setup
dbt Youtube Playlist
Snowflake Youtube Playlist
Thanks to Kahan Data Solutions for the demo videos.

Tools and Technologies

Dbt
Snowflake
Git

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dbt_With_SnowFlake

Table of Contents

Overview

Data Build Tool

dbt Installation

Connecting dbt with Snowflake

Folder Structure

Program Flow

Program Execution

Data Model

Level Up

Documentation and Material

Tools and Technologies

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dbt_With_SnowFlake

Table of Contents

Overview

Data Build Tool

dbt Installation

Connecting dbt with Snowflake

Folder Structure

Program Flow

Program Execution

Data Model

Level Up

Documentation and Material

Tools and Technologies