Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added quick start guide to use cosmos in airflow workflow for dbt transformations. #211

Merged
merged 11 commits into from
Jul 23, 2024

Conversation

sc250072
Copy link
Collaborator

Added quick start guide to use cosmos in airflow workflow for dbt transformations.

Copy link

github-actions bot commented Jul 16, 2024

PR Preview Action v1.4.7
Preview removed because the pull request was closed.
2024-07-23 10:29 UTC

sunilkmallam
sunilkmallam previously approved these changes Jul 17, 2024
Copy link
Contributor

@sunilkmallam sunilkmallam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments and suggestions. Please make changes as you see fit.

* Python 3.8, 3.9, 3.10 or 3.11 installed.
* python3-env, python3-pip, pipx installed.
[source, bash]
sudo apt install -y python3-venv python3-pip pipx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this command is different for each OS. Should we split it according to that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the comment.

+
[source, bash]
----
export AIRFLOW_HOME=~/airflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we differentiate the commands between installing airflow locally vs. in a virtual environment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree with @sunilkmallam let's install it only on venv, we don't want the user to tamper their default python install.
Let's create venv first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included installation on venv.

[source, bash]
export dbt_project_home_dir=../../jaffle_shop
+
NOTE: Change `/../../` to path of jaffle_shop project path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change "path of jaffle_shop project path" to path of "your jaffle_shop project path".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE is not required as per latest change. Removed the Note.


== Summary

In this quick start guide, we explored how to utilize Astronomer Cosmos library in apache airflow to execute dbt teradata transformations against Teradata Vantage instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbt teradata transformations should be replaced with dbt transformations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

+
include::ROOT:partial$vantage_clearscape_analytics.adoc[]
* Python 3.8, 3.9, 3.10 or 3.11 installed.
* python3-env, python3-pip, pipx installed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* python3-env, python3-pip, pipx installed.
* python3-env, python3-pip, pipx installed.

This is only applicable for linux mac comves with env and pip, we are never using pipx in the guide, not sure why we are installing it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added instructions for Mac, Windows and Linux as tabs

* Python 3.8, 3.9, 3.10 or 3.11 installed.
* python3-env, python3-pip, pipx installed.
[source, bash]
sudo apt install -y python3-venv python3-pip pipx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above this is only applicable to linux. Mac doesn't need it and Windows is not supported, only through WSL which is linux anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added note for WSL on Windows. Wherever tab is required for three platforms, added tabs and removed tabs at not required places.

+
[source, bash]
----
export AIRFLOW_HOME=~/airflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree with @sunilkmallam let's install it only on venv, we don't want the user to tamper their default python install.
Let's create venv first.

----
2. Install `apache-airflow` stable version 2.9.2 from PyPI repository.:
+
[tabs]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the blocks are the same because mac is as linux a Unix type OS and windows is on WSL which is not linux as well, same as in the other guide not sure we need to repeat on the three tabs. Maybe we just make a comment that windows is WSL so windows on WSL is the same as linux, and that mac is also the same, the only difference would be installing pip and env which don't come as default in python for linux.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the comment

----

== Install dbt
1. Create a new python environment to manage dbt and its dependencies. Activate the environment:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be at the beginning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JH255095 do want to place Install dbt section at the beginning? Currently placed after Install Airflow step.

== Install dbt
1. Create a new python environment to manage dbt and its dependencies. Activate the environment:
+
[tabs]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again same as above, we might not need the tabs since all is the same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

airflow standalone
----
2. Access the airflow UI. Visit https://localhost:8080 in the browser and log in with the admin account details shown in the terminal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add image of the terminal

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

----
2. Access the airflow UI. Visit https://localhost:8080 in the browser and log in with the admin account details shown in the terminal.

== Define Apache Airflow connection to Vantage Cloud Lake
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one let's make it more generic, assume that you are working with CSAE, this is applicable to lake as it appears in the compute clusters guide, but here we don't have all the section regarding the SQL needed for creating the user, if we just point that you need a teradata environment and you can get one on CSAE we save all of that part. We just need to modify this section then to use the CSAE host, user and password.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as more generic.

@JH255095 JH255095 merged commit 9e8b344 into main Jul 23, 2024
1 check passed
@JH255095 JH255095 deleted the IDE-24509-Cosmos branch July 23, 2024 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants