Skip to content

Commit

Permalink
Update homework.md
Browse files Browse the repository at this point in the history
  • Loading branch information
TylerJSimpson authored Feb 19, 2023
1 parent 8b45825 commit a079737
Showing 1 changed file with 26 additions and 42 deletions.
68 changes: 26 additions & 42 deletions week_4/homework/homework.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,33 @@
## Week 4 Homework

In this homework, we'll use the models developed during the week 4 videos and enhance the already presented dbt project using the already loaded Taxi data for fhv vehicles for year 2019 in our DWH.
Please see my code for the following:

We will use the data loaded for:

* Building a source table: `stg_fhv_tripdata`
* Building a fact table: `fact_fhv_trips`
* Create a dashboard
* models/staging/[stg_fhv_tripdata.sql](https://github.com/TylerJSimpson/data_engineering_zoomcamp/blob/main/week_4/homework/models/staging/stg_fhv_tripdata.sql)
* models/staging/[schema.yml](https://github.com/TylerJSimpson/data_engineering_zoomcamp/blob/main/week_4/homework/models/staging/schema.yml)
* models/core/[fact_fhv_trips.sql](https://github.com/TylerJSimpson/data_engineering_zoomcamp/blob/main/week_4/homework/models/core/fact_fhv_trips.sql)
* models/core/[schema.yml](https://github.com/TylerJSimpson/data_engineering_zoomcamp/blob/main/week_4/homework/models/core/schema.yml)
* dashboard


### Question 1:

**What is the count of records in the model fact_trips after running all models with the test run variable disabled and filtering for 2019 and 2020 data only (pickup datetime)**

You'll need to have completed the "Build the first dbt models" video and have been able to run the models via the CLI.
You should find the views and models for querying in your DWH.

- 41648442
- 51648442
- 61648442
- 71648442
Record count of fact_trips in BigQuery details page:
61,636,378

Closest option:
**61648442**

### Question 2:

**What is the distribution between service type filtering by years 2019 and 2020 data as done in the videos**

You will need to complete "Visualising the data" videos, either using data studio or metabase.
You will need to complete "Visualising the data" videos, either using data studio or metabase.

My original [dashboard](https://lookerstudio.google.com/s/kfnV1LcxmcI) answers this question.

- 89.9/10.1
- 94/6
- 76.3/23.7
- 99.1/0.9
**89.9/10.1**



Expand All @@ -42,11 +38,15 @@ You will need to complete "Visualising the data" videos, either using data studi
Create a staging model for the fhv data for 2019 and do not add a deduplication step. Run it via the CLI without limits (is_test_run: false).
Filter records with pickup time in year 2019.

- 33244696
- 43244696
- 53244696
- 63244696
```sql
SELECT COUNT(*)
FROM `dtc-de-0315.dbt_cloud_pr_218904_5.stg_fhv_tripdata`
```
Result:
43244693

Closest option:
**43244696**

### Question 4:

Expand All @@ -56,31 +56,15 @@ Create a core model for the stg_fhv_tripdata joining with dim_zones.
Similar to what we've done in fact_trips, keep only records with known pickup and dropoff locations entries for pickup and dropoff locations.
Run it via the CLI without limits (is_test_run: false) and filter records with pickup time in year 2019.

- 12998722
- 22998722
- 32998722
- 42998722
Record count of fact_fhv_trips in BigQuery details page:
**22,998,722**

### Question 5:

**What is the month with the biggest amount of rides after building a tile for the fact_fhv_trips table**
Create a dashboard with some tiles that you find interesting to explore the data. One tile should show the amount of trips per month, as done in the videos for fact_trips, based on the fact_fhv_trips table.

- March
- April
- January
- December



## Submitting the solutions

* Form for submitting: https://forms.gle/6A94GPutZJTuT5Y16
* You can submit your homework multiple times. In this case, only the last submission will be used.

Deadline: 25 February (Saturday), 22:00 CET

Created a [visual](https://lookerstudio.google.com/s/jYdIDO070NY) and also double checked in BigQuery. January is by far the largest month.

## Solution
**January**

We will publish the solution here

0 comments on commit a079737

Please sign in to comment.