BigQuery Analytics

This repository is in experimental stage. It is NOT ready for production.

Changes are being made.

This package allows you to easily monitor and manage your BigQuery query costs, identifying costly queries in order to enhance optimizing and resource saving.

release/<branch-name> or major/<branch-name> or Release/<branch-name> or Major/<branch-name>
feature/<branch-name> or minor/<branch-name> with capitalised letters work as well
patch/<branch-name> or fix/<branch-name> or hotfix/<branch-name> or bugfix/<branch-name> with capitalised letters work as well

If branch is already made, just rename it before passing the pull request.

🏃 Quickstart

New to dbt packages? Read more about them here.

Requirements

dbt version

dbt version >= 1.0.0

dbt_utils package. Read more about them here.

dbt-labs/dbt_utils version: 1.1.1

Installation

Include this package in your packages.yml file.

packages:
  - git: "git@github.com:techindicium/dbt-bigquery-analytics.git" # insert git URL

Run dbt deps to install the package.

Configuring package

The package's models can be configured in your dbt_project.yml by specifying the package under models node.

The BQ queries's start date you want to monitor also must be declared in vars node.

We add some other variables to improve your monitoring and management of BigQuery. Some of them are related with dbt. You can add your dbt targets and sources.

It is important to add the exchange rate between USD dollar and BRL real and the price per terabyte processed.

Something important to point out is that the models created by this package are not restrictered to dbt, in this way when the informations about BigQuery have no relation to dbt, you'll see the coluns related to dbt bringing something like others or inaplicable.

Models

models:
    bigquery_analytics:
        staging:
            materialized: ephemeral
        marts:
            materialized: table

Vars

vars:
  dbt_bigquery_analytics:
      bigquery_analytics_start_date: cast('2023-01-01' as date) # inside the double quotes, add the start date of the project
  region: "region-us"
  price_per_tib: 6.25 # here you need add the price per tebibyte processed in USD. At the moment, this is the cost.
  price_per_tib_brl: 40.167405531 # here you need add the price per tebibyte processed in BRL. At the moment, this is the cost.
  dbt_sources: [
      'adw'
      , 'adf'
      , 'databricks_workflow'
  ] # you can configure the sources of your procject here
  ci_dbt_target: 'ci' # add the CI/CD target of your dbt project
  prod_dbt_target: 'prod' # add the development target of your dbt project
  dev_dbt_target: 'dev' # add the production target of your dbt project
  dbt_source_monitoring_dataset: 'raw_monitoring' # if you have one, add a source dataset of your monitoring data
  dbt_prod_monitoring_dataset: 'public_monitoring' # if you have one, add the destination dataset of your monitoring transformed data

Source

The project's sources can be configured in your source.yml, normally on your staging folder. Attention is needed while naming the source' schema and tables names to ensure you are matching the name BigQuery sets on the and schema and table with your source. If the names match, your package will work as expected.

version: 2

sources:
  - name: bigquery_info_schema
    schema: INFORMATION_SCHEMA
    database: "{{ target.database }}.{{ var('region') }}" ## vars to fit your use case
    tables:
      - name: information_schema_jobs
        identifier: JOBS

Running the models

After setting up the package in dbt_project.yml and source.yml as the previous steps, you can now run the package with the following command line: dbt run -s bigquery_analytics. After running it, the 5 models of the package will materialize in your target schema as they have been configured.

Testing modifications in the project

Inside the integration_tests folder, you can use the setup.sh to auxiliate the creation of an env and installation of requirements.txt for you.

The only commands that you need to do is:

chmod +x setup.sh and source setup.sh

Inside the integration_tests folder, you have the dbt_project already configured, you have profiles.yml already set to BigQuery. To make a connection to BigQuery, so you can start testing, you should have credentials in indicium-sandbox project.

To set your credentials in your environment, you should rename the example.env: example.env --> .env.

You should put the name of your dataset

dev_firstName_lastName

and then do the command

source .env

Now you can use your credentials in your environment. Dbt debug it!

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github		.github
integration_tests		integration_tests
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dbt_project.yml		dbt_project.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BigQuery Analytics

This repository is in experimental stage. It is NOT ready for production.

Changes are being made.

Table of Contents

Before creating a branch

🏃 Quickstart

Requirements

Installation

Configuring package

Models

Vars

Source

Running the models

Testing modifications in the project

About

Releases 10

Packages

Contributors 7

Languages

License

techindicium/dbt-bigquery-analytics

Folders and files

Latest commit

History

Repository files navigation

BigQuery Analytics

This repository is in experimental stage. It is NOT ready for production.

Changes are being made.

Table of Contents

Before creating a branch

🏃 Quickstart

Requirements

Installation

Configuring package

Models

Vars

Source

Running the models

Testing modifications in the project

About

Resources

License

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 7

Languages

Packages