-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added quick start guide to use cosmos in airflow workflow for dbt transformations. #211
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
2e3d91d
Added quickstart guide to use cosmos in airflow workflow for dbt tra…
satish-chinthanippu 0f53b41
added apache to airflow
satish-chinthanippu 7ab6393
removed unnecessary statement
satish-chinthanippu 0a1d554
docs: Update quickstart guide for using cosmos in airflow workflow fo…
Daniel-Itzul 0b2f7f7
addressed review comments
satish-chinthanippu 4f349b1
Teradata Vantage placed in place of TeradataVantage Lake
satish-chinthanippu 3c5b9dd
Update modules/other-integrations/pages/execute-dbt-teradata-transfor…
JH255095 4ec8f8c
Update modules/other-integrations/pages/execute-dbt-teradata-transfor…
JH255095 6b9edf7
Update modules/other-integrations/pages/execute-dbt-teradata-transfor…
JH255095 a02231e
Update modules/other-integrations/pages/execute-dbt-teradata-transfor…
JH255095 192b66d
Update modules/other-integrations/pages/execute-dbt-teradata-transfor…
JH255095 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file added
BIN
+122 KB
...ges/execute-dbt-teradata-transformations-in-airflow-with-cosmos/airflow-dag.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+106 KB
...-transformations-in-airflow-with-cosmos/execute-dbt-teradata-cosmos-airflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
215 changes: 215 additions & 0 deletions
215
...grations/pages/execute-dbt-teradata-transformations-in-airflow-with-cosmos.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,215 @@ | ||
= Execute dbt teradata transformation jobs in Apache Airflow using Astronomer Cosmos library | ||
:experimental: | ||
:page-author: Satish Chinthanippu | ||
:page-email: satish.chinthanippu@teradata.com | ||
:page-revdate: July 15th, 2024 | ||
:description: Execute dbt teradata transformation jobs in Apache Airflow using Astronomer Cosmos library | ||
:keywords: data warehouses, compute storage separation, teradata, vantage, cloud data platform, object storage, business intelligence, enterprise analytics, airflow, queries, dbt, cosmos, astronomer | ||
:dir: execute-dbt-teradata-transformations-in-airflow-with-cosmos | ||
:auxdir: execute-dbt-teradata-transformations-in-airflow-with-cosmos | ||
|
||
== Overview | ||
|
||
This tutorial demonstrates how to install Apache Airflow on a local machine, configure the workflow to use dbt teradata to run dbt transformations using the astronomer cosmos library, and run it against a Teradata Vantage database. Apache Airflow is a task scheduling tool that is typically used to build data pipelines to process and load data. https://astronomer.github.io/astronomer-cosmos/[Astronomer cosmos] library simplifies orchestrating dbt data transformations in Apache Airflow. Using Cosmos, allows running dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code. | ||
In this example, we will explain how to use astronomer cosmos to run dbt transformations in airflow against Teradata Vantage database. | ||
|
||
NOTE: Use `https://learn.microsoft.com/en-us/windows/wsl/install[The Windows Subsystem for Linux (WSL)]` on `Windows` to try this quickstart example. | ||
|
||
== Prerequisites | ||
* Access to a Teradata Vantage instance, version 17.10 or higher. | ||
+ | ||
include::ROOT:partial$vantage_clearscape_analytics.adoc[] | ||
* Python 3.8, 3.9, 3.10 or 3.11 and python3-env, python3-pip installed. | ||
+ | ||
[tabs, id="python_install"] | ||
==== | ||
Linux:: | ||
+ | ||
[source,bash] | ||
---- | ||
sudo apt install -y python3-venv python3-pip | ||
---- | ||
WSL:: | ||
+ | ||
[source,bash] | ||
---- | ||
sudo apt install -y python3-venv python3-pip | ||
---- | ||
macOS:: | ||
+ | ||
[source,bash] | ||
---- | ||
brew install python | ||
---- | ||
Refer https://docs.python-guide.org/starting/install3/osx/[Installation Guide] if you face any issues. | ||
==== | ||
|
||
== Install Apache Airflow and Astronomer Cosmos | ||
1. Create a new python environment to manage airflow and its dependencies. Activate the environment: | ||
+ | ||
NOTE: This will install Apache Airflow as well. | ||
+ | ||
[source, bash] | ||
---- | ||
python3 -m venv airflow_env | ||
source airflow_env/bin/activate | ||
pip install "astronomer-cosmos" | ||
---- | ||
|
||
+ | ||
2. Install the Apache Airflow Teradata provider | ||
+ | ||
[source, bash] | ||
---- | ||
pip install "apache-airflow-providers-teradata" | ||
---- | ||
3. Set the AIRFLOW_HOME environment variable. | ||
+ | ||
[source, bash] | ||
---- | ||
export AIRFLOW_HOME=~/airflow | ||
---- | ||
|
||
== Install dbt | ||
1. Create a new python environment to manage dbt and its dependencies. Activate the environment: | ||
+ | ||
[source, bash] | ||
---- | ||
python3 -m venv dbt_env | ||
source dbt_env/bin/activate | ||
---- | ||
2. Install `dbt-teradata` and `dbt-core` modules: | ||
+ | ||
[source, bash] | ||
---- | ||
pip install dbt-teradata dbt-core | ||
---- | ||
|
||
== Setup dbt project | ||
|
||
1. Clone the jaffle_shop repository and cd into the project directory: | ||
+ | ||
[source, bash] | ||
---- | ||
git clone https://github.com/Teradata/jaffle_shop-dev.git jaffle_shop | ||
---- | ||
2. Make a new folder, dbt, inside $AIRFLOW_HOME/dags folder. Then, copy/paste jaffle_shop dbt project into $AIRFLOW_HOME/dags/dbt directory | ||
+ | ||
[source, bash] | ||
---- | ||
mkdir -p $AIRFLOW_HOME/dags/dbt/ | ||
cp -r jaffle_shop $AIRFLOW_HOME/dags/dbt/ | ||
---- | ||
|
||
== Configure Apache Airflow | ||
1. Switch to virtual environment where Apache Airflow was installed at <<Install Apache Airflow and Astronomer Cosmos>> | ||
+ | ||
[source, bash] | ||
---- | ||
source airflow_env/bin/activate | ||
---- | ||
2. Configure the listed environment variables to activate the test connection button, preventing the loading of sample DAGs and default connections in Airflow UI. | ||
+ | ||
[source, bash] | ||
export AIRFLOW__CORE__TEST_CONNECTION=Enabled | ||
export AIRFLOW__CORE__LOAD_EXAMPLES=false | ||
export AIRFLOW__CORE_LOAD_DEFAULT_CONNECTIONS=false | ||
|
||
3. Define the path of jaffle_shop project as an environment variable `dbt_project_home_dir`. | ||
+ | ||
[source, bash] | ||
---- | ||
export dbt_project_home_dir=$AIRFLOW_HOME/dags/dbt/jaffle_shop | ||
---- | ||
4. Define the path to the virtual environment where dbt-teradata was installed as an environment variable `dbt_venv_dir`. | ||
[source, bash] | ||
export dbt_venv_dir=/../../dbt_env/bin/dbt | ||
+ | ||
NOTE: You might need to change `/../../` to the specific path where the `dbt_env` virtual environment is located. | ||
|
||
== Start Apache Airflow web server | ||
1. Run airflow web server | ||
+ | ||
[source, bash] | ||
---- | ||
airflow standalone | ||
---- | ||
2. Access the airflow UI. Visit https://localhost:8080 in the browser and log in with the admin account details shown in the terminal. | ||
+ | ||
image::{dir}/execute-dbt-teradata-cosmos-airflow.png[Airflow Password,align="left" width=75%] | ||
|
||
== Define Apache Airflow connection to Vantage | ||
|
||
1. Click on Admin - Connections | ||
2. Click on + to define new connection to Teradata Vantage instance. | ||
3. Define new connection with id `teradata_default` with Teradata Vantage instance details. | ||
* Connection Id: teradata_default | ||
* Connection Type: Teradata | ||
* Database Server URL (required): Teradata Vantage instance hostname to connect to. | ||
* Database: jaffle_shop | ||
* Login (required): database user | ||
* Password (required): database user password | ||
|
||
== Define DAG in Apache Airflow | ||
Dags in airflow are defined as python files. The dag below runs the dbt transformations defined in the `jaffle_shop` dbt project on a Teradata Vantage system using cosmos.Copy the python code below and save it as `airflow-cosmos-dbt-teradata-integration.py` under the directory $AIRFLOW_HOME/dags. | ||
|
||
[source, python] | ||
---- | ||
import os | ||
from datetime import datetime | ||
|
||
from airflow import DAG | ||
from cosmos import DbtTaskGroup, ProjectConfig, ProfileConfig, ExecutionConfig | ||
from cosmos.profiles import TeradataUserPasswordProfileMapping | ||
|
||
PATH_TO_DBT_VENV = f"{os.environ['dbt_venv_dir']}" | ||
PATH_TO_DBT_PROJECT = f"{os.environ['dbt_project_home_dir']}" | ||
|
||
|
||
execution_config = ExecutionConfig( | ||
dbt_executable_path=PATH_TO_DBT_VENV, | ||
) | ||
profile_config = ProfileConfig( | ||
profile_name="generated_profile", | ||
target_name="dev", | ||
profile_mapping=TeradataUserPasswordProfileMapping( | ||
conn_id="teradata_default", | ||
), | ||
) | ||
with DAG( | ||
dag_id="execute_dbt_transformations_with_cosmos", | ||
max_active_runs=1, | ||
max_active_tasks=10, | ||
catchup=False, | ||
start_date=datetime(2024, 1, 1), | ||
|
||
) as dag: | ||
transform_data = DbtTaskGroup( | ||
group_id="transform_data", | ||
project_config=ProjectConfig(PATH_TO_DBT_PROJECT), | ||
profile_config=profile_config, | ||
execution_config=execution_config, | ||
default_args={"retries": 2}, | ||
) | ||
---- | ||
|
||
== Load DAG | ||
|
||
When the dag file is copied to $AIRFLOW_HOME/dags, Apache Airflow displays the dag in UI under DAGs section. It will take 2 to 3 minutes to load DAG in Apache Airflow UI. | ||
|
||
== Run DAG | ||
|
||
Run the dag as shown in the image below. | ||
|
||
image::{dir}/airflow-dag.png[Run dag,align="left" width=75%] | ||
|
||
== Summary | ||
|
||
In this quick start guide, we explored how to utilize Astronomer Cosmos library in Apache Airflow to execute `dbt transformations` against a Teradata Vantage instance. | ||
|
||
== Further reading | ||
* link:https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html[Apache Airflow DAGs reference] | ||
* link:https://astronomer.github.io/astronomer-cosmos/[Benefits of Cosmos] | ||
* link:https://astronomer.github.io/astronomer-cosmos/profiles/TeradataUserPassword.html[Teradata Cosmos Profile] | ||
* link:https://learn.microsoft.com/en-us/windows/wsl/install[Install WSL on windows] | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add image of the terminal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.