This session guides you to create an Apache Airflow DAG in GCP Cloud Composer.
Scheduled to run a job using Dataproc Submit, creating and deleting an ephemeral cluster.
Batch job example used: GCS to GCS.
The DAG used in this guide is located here.
export PROJECT_ID="your_project_id"
export REGION="your_region"
export COMPOSER_ENV="your_composer-env"
export COMPOSER_LOCATION=${REGION}
Update the composer/src/dags/config/a_batch_submit_cluster.ini to your desired config values, read by the python file.
Follow the instructions to create a Composer Environment.
export AIRFLOW_VARIABLE="gcloud composer environments run ${COMPOSER_ENV} \
--location ${COMPOSER_LOCATION} variables -- set"
$AIRFLOW_VARIABLE PROJECT_ID "${PROJECT_ID}" && \
$AIRFLOW_VARIABLE REGION "${REGION}"
export LOCAL_DAG_PYTHON="composer/src/dags/a_batch_submit_cluster.py"
export LOCAL_DAG_CONFIG="composer/src/dags/config/a_batch_submit_cluster.ini"
export DAGs_FOLDER=$(gcloud composer environments describe $COMPOSER_ENV \
--location $REGION \
--format="get(config.dagGcsPrefix)")
gsutil cp $LOCAL_DAG_PYTHON $DAGs_FOLDER/
gsutil cp $LOCAL_DAG_CONFIG $DAGs_FOLDER/config/
The job will run as scheduled, creating a Dataproc cluster, running the Spark job, and deleting the cluster.
All code snippets within this document are provided under the following terms.
Copyright 2022 Google. This software is provided as-is, without warranty or representation for any use or purpose. Your use of it is subject to your agreement with Google.