Data pipeline for Suluhu with dbt, AWS, Python, and Postgres.

I created this repo as practice for dbt in a production environment. The structure mirrors an ELT batch pipeline that loads the business data in a fictional e-commerce company's (Suluhu) data warehouse and performs the transformation upon execution.

Background

The project contains:
- orders.csv: Fact table about orders gotten on their website
- reviews.csv: Fact table on reviews given for a particular delivered product
- shipments_deliveries.csv: Fact table on shipments and their delivery dates

Tools

Python - Extract and load
SQL - Transformation
dbt - Data modeling and testing
PostgreSQL - Data warehouse
AWS S3 - Data lake
Docker - Infrastructure
Akuko - Visualization

Process

The data was extracted from the AWS bucket using the load_data.py script in the fal_scripts directory.
Three main directories were created for the models i.e. Staging, Intermediate and Marts. The Marts models are materialized as tables upon running a production job in dbt.
The production tables have been created on the reporting schema.
Each model and script includes necessary comments detailing my thought process.

Dashboard

The dashboard has been created on Akuko

To initialize a local instance of dbt-core for the project:

Build the docker image

docker build -t dbt-venv .

This will create a Docker image named dbt-venv.

Run the container:

docker run -it --rm -v /path/to/local/code:/app dbt-venv
Once inside the container, Python, pip, AWS CLI, dbt-core, and boto3 can be deployed as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
analyses		analyses
fal_scripts		fal_scripts
macros		macros
models		models
seeds		seeds
snapshots		snapshots
tests		tests
.gitignore		.gitignore
README.md		README.md
dbt_project.yml		dbt_project.yml
dockerfile		dockerfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data pipeline for Suluhu with dbt, AWS, Python, and Postgres.

Background

Tools

Process

Dashboard

About

Languages

mghendi/suluhu

Folders and files

Latest commit

History

Repository files navigation

Data pipeline for Suluhu with dbt, AWS, Python, and Postgres.

Background

Tools

Process

Dashboard

About

Topics

Resources

Stars

Watchers

Forks

Languages