diff --git a/modules/other-integrations/images/execute-dbt-teradata-transformations-in-airflow-with-cosmos/airflow-dag.png b/modules/other-integrations/images/execute-dbt-teradata-transformations-in-airflow-with-cosmos/airflow-dag.png new file mode 100644 index 000000000..67d76c2b1 Binary files /dev/null and b/modules/other-integrations/images/execute-dbt-teradata-transformations-in-airflow-with-cosmos/airflow-dag.png differ diff --git a/modules/other-integrations/images/execute-dbt-teradata-transformations-in-airflow-with-cosmos/execute-dbt-teradata-cosmos-airflow.png b/modules/other-integrations/images/execute-dbt-teradata-transformations-in-airflow-with-cosmos/execute-dbt-teradata-cosmos-airflow.png new file mode 100644 index 000000000..e45630de8 Binary files /dev/null and b/modules/other-integrations/images/execute-dbt-teradata-transformations-in-airflow-with-cosmos/execute-dbt-teradata-cosmos-airflow.png differ diff --git a/modules/other-integrations/pages/execute-dbt-teradata-transformations-in-airflow-with-cosmos.adoc b/modules/other-integrations/pages/execute-dbt-teradata-transformations-in-airflow-with-cosmos.adoc new file mode 100644 index 000000000..afbfef2fa --- /dev/null +++ b/modules/other-integrations/pages/execute-dbt-teradata-transformations-in-airflow-with-cosmos.adoc @@ -0,0 +1,215 @@ += Execute dbt teradata transformation jobs in Apache Airflow using Astronomer Cosmos library +:experimental: +:page-author: Satish Chinthanippu +:page-email: satish.chinthanippu@teradata.com +:page-revdate: July 15th, 2024 +:description: Execute dbt teradata transformation jobs in Apache Airflow using Astronomer Cosmos library +:keywords: data warehouses, compute storage separation, teradata, vantage, cloud data platform, object storage, business intelligence, enterprise analytics, airflow, queries, dbt, cosmos, astronomer +:dir: execute-dbt-teradata-transformations-in-airflow-with-cosmos +:auxdir: execute-dbt-teradata-transformations-in-airflow-with-cosmos + +== Overview + +This tutorial demonstrates how to install Apache Airflow on a local machine, configure the workflow to use dbt teradata to run dbt transformations using the astronomer cosmos library, and run it against a Teradata Vantage database. Apache Airflow is a task scheduling tool that is typically used to build data pipelines to process and load data. https://astronomer.github.io/astronomer-cosmos/[Astronomer cosmos] library simplifies orchestrating dbt data transformations in Apache Airflow. Using Cosmos, allows running dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code. +In this example, we will explain how to use astronomer cosmos to run dbt transformations in airflow against Teradata Vantage database. + +NOTE: Use `https://learn.microsoft.com/en-us/windows/wsl/install[The Windows Subsystem for Linux (WSL)]` on `Windows` to try this quickstart example. + +== Prerequisites +* Access to a Teradata Vantage instance, version 17.10 or higher. ++ +include::ROOT:partial$vantage_clearscape_analytics.adoc[] +* Python 3.8, 3.9, 3.10 or 3.11 and python3-env, python3-pip installed. ++ +[tabs, id="python_install"] +==== +Linux:: ++ +[source,bash] +---- +sudo apt install -y python3-venv python3-pip +---- +WSL:: ++ +[source,bash] +---- +sudo apt install -y python3-venv python3-pip +---- +macOS:: ++ +[source,bash] +---- +brew install python +---- + Refer https://docs.python-guide.org/starting/install3/osx/[Installation Guide] if you face any issues. +==== + +== Install Apache Airflow and Astronomer Cosmos +1. Create a new python environment to manage airflow and its dependencies. Activate the environment: ++ +NOTE: This will install Apache Airflow as well. ++ +[source, bash] +---- +python3 -m venv airflow_env +source airflow_env/bin/activate +pip install "astronomer-cosmos" +---- + ++ +2. Install the Apache Airflow Teradata provider ++ +[source, bash] +---- +pip install "apache-airflow-providers-teradata" +---- +3. Set the AIRFLOW_HOME environment variable. ++ +[source, bash] +---- +export AIRFLOW_HOME=~/airflow +---- + +== Install dbt +1. Create a new python environment to manage dbt and its dependencies. Activate the environment: ++ +[source, bash] +---- +python3 -m venv dbt_env +source dbt_env/bin/activate +---- +2. Install `dbt-teradata` and `dbt-core` modules: ++ +[source, bash] +---- +pip install dbt-teradata dbt-core +---- + +== Setup dbt project + +1. Clone the jaffle_shop repository and cd into the project directory: ++ +[source, bash] +---- +git clone https://github.com/Teradata/jaffle_shop-dev.git jaffle_shop +---- +2. Make a new folder, dbt, inside $AIRFLOW_HOME/dags folder. Then, copy/paste jaffle_shop dbt project into $AIRFLOW_HOME/dags/dbt directory ++ +[source, bash] +---- +mkdir -p $AIRFLOW_HOME/dags/dbt/ +cp -r jaffle_shop $AIRFLOW_HOME/dags/dbt/ +---- + +== Configure Apache Airflow +1. Switch to virtual environment where Apache Airflow was installed at <> ++ +[source, bash] +---- +source airflow_env/bin/activate +---- +2. Configure the listed environment variables to activate the test connection button, preventing the loading of sample DAGs and default connections in Airflow UI. ++ +[source, bash] + export AIRFLOW__CORE__TEST_CONNECTION=Enabled + export AIRFLOW__CORE__LOAD_EXAMPLES=false + export AIRFLOW__CORE_LOAD_DEFAULT_CONNECTIONS=false + +3. Define the path of jaffle_shop project as an environment variable `dbt_project_home_dir`. ++ +[source, bash] +---- +export dbt_project_home_dir=$AIRFLOW_HOME/dags/dbt/jaffle_shop +---- +4. Define the path to the virtual environment where dbt-teradata was installed as an environment variable `dbt_venv_dir`. +[source, bash] +export dbt_venv_dir=/../../dbt_env/bin/dbt ++ +NOTE: You might need to change `/../../` to the specific path where the `dbt_env` virtual environment is located. + +== Start Apache Airflow web server +1. Run airflow web server ++ +[source, bash] +---- +airflow standalone +---- +2. Access the airflow UI. Visit https://localhost:8080 in the browser and log in with the admin account details shown in the terminal. ++ +image::{dir}/execute-dbt-teradata-cosmos-airflow.png[Airflow Password,align="left" width=75%] + +== Define Apache Airflow connection to Vantage + +1. Click on Admin - Connections +2. Click on + to define new connection to Teradata Vantage instance. +3. Define new connection with id `teradata_default` with Teradata Vantage instance details. +* Connection Id: teradata_default +* Connection Type: Teradata +* Database Server URL (required): Teradata Vantage instance hostname to connect to. +* Database: jaffle_shop +* Login (required): database user +* Password (required): database user password + +== Define DAG in Apache Airflow +Dags in airflow are defined as python files. The dag below runs the dbt transformations defined in the `jaffle_shop` dbt project on a Teradata Vantage system using cosmos.Copy the python code below and save it as `airflow-cosmos-dbt-teradata-integration.py` under the directory $AIRFLOW_HOME/dags. + +[source, python] +---- +import os +from datetime import datetime + +from airflow import DAG +from cosmos import DbtTaskGroup, ProjectConfig, ProfileConfig, ExecutionConfig +from cosmos.profiles import TeradataUserPasswordProfileMapping + +PATH_TO_DBT_VENV = f"{os.environ['dbt_venv_dir']}" +PATH_TO_DBT_PROJECT = f"{os.environ['dbt_project_home_dir']}" + + +execution_config = ExecutionConfig( + dbt_executable_path=PATH_TO_DBT_VENV, +) +profile_config = ProfileConfig( + profile_name="generated_profile", + target_name="dev", + profile_mapping=TeradataUserPasswordProfileMapping( + conn_id="teradata_default", + ), +) +with DAG( + dag_id="execute_dbt_transformations_with_cosmos", + max_active_runs=1, + max_active_tasks=10, + catchup=False, + start_date=datetime(2024, 1, 1), + +) as dag: + transform_data = DbtTaskGroup( + group_id="transform_data", + project_config=ProjectConfig(PATH_TO_DBT_PROJECT), + profile_config=profile_config, + execution_config=execution_config, + default_args={"retries": 2}, + ) +---- + +== Load DAG + +When the dag file is copied to $AIRFLOW_HOME/dags, Apache Airflow displays the dag in UI under DAGs section. It will take 2 to 3 minutes to load DAG in Apache Airflow UI. + +== Run DAG + +Run the dag as shown in the image below. + +image::{dir}/airflow-dag.png[Run dag,align="left" width=75%] + +== Summary + +In this quick start guide, we explored how to utilize Astronomer Cosmos library in Apache Airflow to execute `dbt transformations` against a Teradata Vantage instance. + +== Further reading +* link:https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html[Apache Airflow DAGs reference] +* link:https://astronomer.github.io/astronomer-cosmos/[Benefits of Cosmos] +* link:https://astronomer.github.io/astronomer-cosmos/profiles/TeradataUserPassword.html[Teradata Cosmos Profile] +* link:https://learn.microsoft.com/en-us/windows/wsl/install[Install WSL on windows] +