add dbt deployment

shameekagarwal · Sep 6, 2024 · c65c8c1 · c65c8c1
1 parent a0f800e
commit c65c8c1
Showing 1 changed file with 13 additions and 34 deletions.
diff --git a/_posts/2024-08-31-dbt.md b/_posts/2024-08-31-dbt.md
@@ -289,7 +289,9 @@ title: DBT
   - run `dbt run` and then `dbt test` on 2nd layer of our models
   - and so on...
 - this way, our **downstream models** are never even built if the **upstream models** fail
+- understand how doing this manually using `dbt run` and `dbt test` would require configuring multiple commands, and would be very cumbersome
 - the **skip** tab will show us the models which were for e.g. skipped due to failing tests
+- it supports all the flags that test, run, snapshot and seed support. when we pass a flag, it is passed down to all the commands that support it. e.g. full refresh will be applied to both models and seeds
 
 ## Documentation
 
@@ -331,6 +333,14 @@ title: DBT
   ```
 - just like we documented models, we can also document our sources
 - to generate the documentation, we use `dbt docs generate`
+- by looking at our dag / lineage graph, we can guess the **concurrent threads** we can have at a time
+- in the documentation, in the lineage graph, we can type the **select** syntax - 
+  - `stg_customers` - highlights the `stg_customers` model
+  - `+stg_customers` - highlights `stg_customers` along with all of its upstream dependencies
+  - `stg_customers+` - highlights `stg_customers` along with all of its downstream dependencies
+- after entering our select statement, we hit the **update graph** button / hit enter key to show only nodes related to our query
+- **union** - we can extract dags for all models combined  by separating using a space - `+stg_customers+ +stg_orders+`
+- **intersection** - we can extract the shared upstream between all models by separating using a comma - `+stg_customers+,+stg_orders+`
 
 ## Deployment
 
@@ -343,40 +353,9 @@ title: DBT
   - the right way would be to create and use a service user's credentials instead of our own credentials here
   - also, the models now are created on a dedicated production schema which only this service user has access to, not a user specific **target schema**
 - after configuring all this, we can create our environment
-- above was the **direct promotion** workflow
-- **direct promotion** / **one trunk** - 
-  - create feature branches from the main branch
-  - merge feature branches into the main branch after development
-  - as part of th pr, tests are run as a part of the ci (TODO - add link) to ensure quality
-  - the next run of the [job](#jobs) on the main branch will include these changes
-- **indirect promotion** / **many trunks** - 
-  - create feature branch from the intermediary staging branch
-  - merge feature branches into this intermediary staging branch
-  - the process of pr reviews and ci checks still stay the same
-  - some additional testing is run in this staging branch - we test individual features during development, but we can test how the group of new features work in this environment
-  - finally, the staging branch is merged into the main branch
-- below, i have attached the general settings section for all the three environments
-
-### Development Environment
-
-![](/assets/img/dbt/development-env.png)
-
-### Staging Environment
-
-![](/assets/img/dbt/staging-env.png)
-
-### Production Environment
-
-![](/assets/img/dbt/production-env.png)
-
-## Jobs
-
-- go to deploy -> jobs and create a new job
-- the **job** can be of type **daily job**
-- we need to select the environment for this job
-- here, we can also enter a schedule for this job
-- we can also configure the set of commands that we would want to run on that cadence. it runs build, and allows us to check options for running freshness checks on sources and generating docs
+- then, we can create a **job** of type **daily job** for this environment
+- here, we can enter a schedule for this job
+- we can also configure the set of commands that we would want to run on that cadence - it runs build, and allows us to check options for running freshness checks on sources and generating docs
 - after configuring all this, we can create our job
 - we can trigger this job manually whenever we want as well apart from relying on the schedule to trigger it
 - dbt cloud easily lets us access our documentation, shows us a breakdown of how much time the models took to build, etc
-- an execution of a job is called a **run**