Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WDL cheat sheet #7

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 32 additions & 39 deletions WDL/Designing-and-running-workflows-for-Terra.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,23 @@
# Designing and Running Workflows For Terra: Tips & Tricks
*Note: Some of this document rephrases information in the WDL library book, README.md*

Those of you familiar with writing WDL workflows will feel right at home on Terra, as Terra uses the same program, Cromwell, as you likely use for your local WDL scripts. That being said there are a few differences once things move to the cloud. We've compiled a list of general advice to those who are new to writing workflows for Terra's compute envirnonment and to aid with troubleshooting. This assumes some familarity with WDL itself, so those new to the world of WDL may benefit more from the spec or other resources in this BYOT document.

- [Helpful Resources](#helpful-resources)
- [Tips and Tricks: Data Access](#tips-and-tricks-data-access)
* [General DRS tips](#general-drs-tips)
* [Use gs:// inputs](#use-gs-inputs)
* [Make sure your credentials are current](#make-sure-your-credentials-are-current)
- [Tips and Tricks: Runtime Attributes](#tips-and-tricks-runtime-attributes)
* [Cromwell can handle preemptible VM interruptions for you](#Cromwell-can-handle-preemptible-VM-interruptions-for-you)
* [Disks attribute must use integers](#disks-attribute-must-use-integers)
* [Avoid using sub() to coerce floats into ints](#avoid-using-sub-to-coerce-floats-into-ints)
- [Tips and Tricks: Efficiency](#tips-and-tricks-efficiency)
* [Saving money with preemptibles: Risks and benefits](#saving-money-with-preemptibles-risks-and-benefits)
- [Tips and Tricks: Miscellanous](#tips-and-tricks-miscellanous)
* [Be careful with comments](#be-careful-with-comments)
* [Use the command line to view the WDL for any given Terra run](#use-the-command-line-to-view-the-wdl-for-any-given-terra-run)

* [Helpful Resources](#helpful-resources)
* [Tips and Tricks: Data Access](#tips-and-tricks--data-access)
+ [General DRS tips](#general-drs-tips)
+ [Use gs:// inputs](#use-gs----inputs)
+ [Make sure your credentials are current](#make-sure-your-credentials-are-current)
* [Tips and Tricks: Runtime Attributes](#tips-and-tricks--runtime-attributes)
+ [Cromwell can handle preemptible VM interruptions for you](#cromwell-can-handle-preemptible-vm-interruptions-for-you)
+ [Disks attribute must use integers](#disks-attribute-must-use-integers)
+ [Avoid using sub() to coerce floats into ints](#avoid-using-sub---to-coerce-floats-into-ints)
+ [Calculate size with strings](#calculate-size-with-strings)
* [Tips and Tricks: Efficiency](#tips-and-tricks--efficiency)
+ [Saving money with preemptibles: Risks and benefits](#saving-money-with-preemptibles--risks-and-benefits)
* [Tips and Tricks: Use Firecloud API to help you debug](#tips-and-tricks--use-firecloud-api-to-help-you-debug)
+ [Get a full error traceback](#get-a-full-error-traceback)
+ [Download a WDL used on a Terra run to your local machine](#download-a-wdl-used-on-a-terra-run-to-your-local-machine)

## Helpful Resources
* [Terra's WDL documentation resources](https://support.terra.bio/hc/en-us/sections/360007274612-WDL-Documentation)
Expand All @@ -26,12 +27,11 @@ Those of you familiar with writing WDL workflows will feel right at home on Terr
## Tips and Tricks: Data Access

### General DRS tips
DRS is a GA4GH standard providing a cloud-agnostic method to access data in the cloud. For NIH cloud platform users (BioData Catalyst, AnVIL, etc.), it is currently used to access data hosted by the Gen3 platform. When data is imported to Terra from Gen3, you will see that genomic files are accessed via "drs://" (rather than "gs://").

DRS is a standardized, cloud-agnostic method that is used to access data hosted by the Gen3 platform. When data is imported to Terra from Gen3, you will see that genomic files are accessed via "drs://" (rather than "gs://").

Cromwell will automatically resolve DRS URIs for you (assuming your credentials are up-to-date, see below) but depending on how your inputs are set up, some changes might be necessary, such as if you're using symlinks. When working with DRS URIs, sometimes you will want to have your inputs be considered strings rather than file paths.
Cromwell in Terra will automatically resolve DRS URIs for you ([assuming your credentials are up-to-date](#make-sure-your-credentials-are-current)), so most WDLs will be able to use DRS URIs without any additional changes.

[This diff on GitHub](https://github.com/DataBiosphere/topmed-workflow-variant-calling/pull/4/files) shows the changes that were needed to make an already existing WDL work with DRS URIs on Terra. Although it is a somewhat complicated example, it may be a helpful template for your own changes.
However, depending on how your inputs are set up, some changes might be necessary, such as if you're using symlinks. When working with DRS URIs, sometimes you will want to have your inputs be considered strings rather than file paths.[This diff on GitHub](https://github.com/DataBiosphere/topmed-workflow-variant-calling/pull/4/files) shows the changes that were needed to make an already existing WDL work with DRS URIs on Terra. Although it is a somewhat complicated example, it may be a helpful template for your own changes.

### Use gs:// inputs
Terra does not support https://storage.google.com inputs, therefore, if one of your input files is in a public Google Cloud bucket, use gs:// notation instead.
Expand All @@ -47,7 +47,6 @@ If you are having issues accessing controlled-access data on Terra, try refreshi
Running WDL locally will ignore a WDL's values for runtime attributes that only apply to the cloud, such as `disks` or `memory`. That means if you had issues with those values, such as using incorrect syntax (see below), those issues will not raise an error on local runs, but will become problems when running on Terra. See the official spec for [pointers on the memory attribute](https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#memory).

### Cromwell can handle preemptible VM interruptions for you

If you include the runtime attribute `preemptible` in your WDL, you can specify the maximum number of times Terra will request a preemptible machine for a task before defaulting back to a non-preemptible machine. For instance, if your set `preemptible: 2`, your workflow will attempt a preembtible at first, and if that machine gets preempted, it will try again with a preemptible again, and if that second try is preempted, then it will use a non-preemptible. For advice on weighing the costs and benefits of preemptibles, see [Saving money with preemptibles: Risks and benefits](#saving-money-with-preemptibles-risks-and-benefits).

### Disks attribute must use integers
Expand Down Expand Up @@ -93,33 +92,27 @@ Cost should play a role in your consideration too: The cost of running a task on

When writing WDL workflows, it is recommended to allow the user to enable or disable preemptibles for each task. This will allow the user to save money on test runs on smaller datasets that are take less time to compute and therefore are less likely to be preempted, while still having the option to avoid preemtibles if they need a task to complete as soon as possible and don't want to wait for it to possibly have to retry.

## Tips and Tricks: Miscellanous
### Be careful with comments
Because command sections of a WDL can interpret BASH commands, and BASH commands make use of the # symbol, Cromwell can misinterpret comments as syntax. This usually only happens if there are special characters in the comment; alphanumerics should work fine.
## Tips and Tricks: Use Firecloud API to help you debug
### Get a full error traceback
You can use the following part of the Firecloud API to get a huge amount of information about your run. This will include all error codes associated with your WDL, which can be helpful, as Terra's current (it is July 2022 as I write this) UI does not show the full traceback when an error occurs.

✅ This will work:
`command <<<`
` echo foo`
` #this is a valid comment`
`>>>`
https://api.firecloud.org/#/CromIAM%20Workflows%20(for%20Job%20Manager)/get_api_workflows__version___id__metadata

❌ This will fail womtool:
`command <<<`
` echo foo`
` #using <<<this syntax>>> for your command section is ~{very cool}!`
`>>>`

### Use the command line to view the WDL for any given Terra run
If you are developing a workflow and need to run multiple tests on Terra, you'll probably be updating your workflow a lot. When you go to run a workflow, you will be able to select the version -- release number or branches if imported from Dockstore, or snapshot if imported from the Broads Methods Repository. But once you run the workflow, Terra's UI shows neither the WDL nor the version number on your workflow page. So, if you are running multiple versions of the same workflow, you might lose track of which run correlates to which WDL. Thankfully, you can extract the WDL once a workflow has finished using the command line.
### Download a WDL used on a Terra run to your local machine
_Note: While this has been made somewhat obsolete by a change in Terra's UI, it may still be useful to power users who wish to check multiple submission's WDLs quickly, or if you wish to recover a WDL from a Terra run (such as if you deleted your local copy)._
If you are developing a workflow and need to run multiple tests on Terra, you'll probably be updating your workflow a lot. When you go to run a workflow, you will be able to select the version -- release number or branches if imported from Dockstore, or snapshot if imported from the Broads Methods Repository. If you are running multiple versions of the same workflow, you might lose track of which run correlates to which WDL. While the WDL is now visible in the UI, if you prefer, you can extract the WDL once a workflow has finished using your local machine's command line.

When you click the "view" button to bring up the job manager, take note of the ID in the top, not to be confused with the workspace-id or submission-id.
When you click the "view" button to bring up the job manager, take note of the ID in the top, not to be confused with the workspace-id or submission-id.

![Screenshot showing the ID of a workflow under the first heading in the Job Manager page](https://raw.githubusercontent.com/aofarrel/verbose-fiesta/master/Terra/Images/BDC_workflowTips.png)
![screenshot of Terra page to indicate where the ID is](images/terra_id.png)

You can use this ID on your local machine's command line to display the WDL on stdout.

`curl -X GET "https://api.firecloud.org/api/workflows/v1/PUT-WORKFLOW-ID-HERE/metadata" -H "accept: application/json" -H "Authorization: Bearer $(gcloud auth print-access-token)" | jq -r '.submittedFiles.workflow'`
```
curl -X GET "https://api.firecloud.org/api/workflows/v1/PUT-WORKFLOW-ID-HERE/metadata" -H "accept: application/json" -H "Authorization: Bearer $(gcloud auth print-access-token)" | jq -r '.submittedFiles.workflow'
```

Note that you will need to [install gcloud and login with the same Google account that your workspace uses](https://cloud.google.com/sdk/docs/quickstarts), and you will need jq to parse the result. jq can be easily installed on Mac with `brew install jq`

With this quick setup, you'll be able to check the WDL of previously run workflows in a flash. To make this process more efficient, put a comment in the WDL itself explaining its changes.
With this quick setup, you'll be able to check the WDL of previously run workflows, which can be helpful if you are running multiple versions of the same workflow to aid with debugging. To make this process more efficient, put a comment in the WDL itself explaining how each WDL differs.
Loading