-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a subset of the Common Workflow Language #12909
Draft
nsoranzo
wants to merge
54
commits into
galaxyproject:dev
Choose a base branch
from
common-workflow-lab:cwl-1.0
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
davelopez
reviewed
Nov 12, 2021
nsoranzo
force-pushed
the
cwl-1.0
branch
5 times, most recently
from
November 12, 2021 11:18
ede05db
to
adb5849
Compare
This comment has been minimized.
This comment has been minimized.
57 tasks
jmchilton
reviewed
Nov 15, 2021
@@ -638,15 +638,16 @@ def default_exit_code_file(files_dir, id_tag): | |||
|
|||
def collect_extra_files(object_store, dataset, job_working_directory): | |||
file_name = dataset.dataset.extra_files_path_name_from(object_store) | |||
temp_file_path = os.path.join(job_working_directory, "working", file_name) | |||
output_dir = "working" if os.path.exists(os.path.join(job_working_directory, "working", file_name)) else "outputs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd feel more comfortable about this if we were choosing between the two based on the job type in some way.
nsoranzo
force-pushed
the
cwl-1.0
branch
3 times, most recently
from
November 16, 2021 03:35
66648ae
to
6859708
Compare
nsoranzo
changed the title
[WIP] Implement a subset of the Common Workflow Language.
[WIP] Implement a subset of the Common Workflow Language
Nov 16, 2021
8 tasks
nsoranzo
force-pushed
the
cwl-1.0
branch
2 times, most recently
from
November 17, 2021 13:18
63c3064
to
edb1b60
Compare
mr-c
reviewed
Nov 17, 2021
nsoranzo
force-pushed
the
cwl-1.0
branch
2 times, most recently
from
November 22, 2021 11:58
41f01f1
to
55380de
Compare
nsoranzo
commented
Nov 22, 2021
nsoranzo
commented
Nov 22, 2021
nsoranzo
commented
Nov 22, 2021
5 tasks
3 tasks
and set LoadListingRequirement as supported requirement, since cwltool handles this for us.
Just get the whole directory archive for now We could get fancier and get individual files one by one ... but not sure there's any point in making it so complicated ?
We should probably just move the contents though ??
and document hacks. This might not be quite right but all default tests seem to pass.
…en replacing with default
I think we might be creating too many deferred datasets, but this will do for now.
To make that useful we should probably upload deferred datasets and allow referring to deferred dataset in location scheme.
CWL output names are namespaced, so in that case the previous check doesn't work.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adding support for a large subset of CWL v1.0.2, v1.1 and v1.2 . This is a group work, led by @jmchilton with contributions from @mr-c , @mvdbeek , @hmenager and myself.
This effort wouldn't have been possible without the support of ELIXIR which sponsored this project at BioHackathon Europe in 2018, 2019, 2020 and 2021.
CWL Support (Tools):
secondaryFiles
that are actual Files are implemented, secondaryFiles containing directories are not yet implemented.InlineJavascriptRequirement
are support to define output files (seetest_cat3
test case).EnvVarRequirement
s are supported (see thetest_env_tool1
andtest_env_tool2
test cases).parseInt-tool
test case).CWL Support (Workflows):
step-valueFrom
andstep-valueFrom2
). This work doesn't yet model non-tool parameters to steps - for complexvalueFrom
expressions like instep-valueFrom3
do not work yet.Remaining Work
The work remaining is vast and will be tracked at https://github.com/common-workflow-lab/galaxy/issues for the time being.
Implementation Notes:
Tools:
expression.json
files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs [STABLE] Fix purge quota adjustment via manager functionality. #27.__secondary_files__
directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File'sbasename
- but tools describe inputs as just the extension. I'm not sure which way Galaxy should store secondary_files in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util.inputs_representation
parameter that can be set to "cwl" now. Thecwl
representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class.File
or non-File
and determined at runtime, sogalaxy.json
is used to dynamically adjust output extension as needed for non-File
parameters.Workflows:
Implementation Description:
The reference implementation Python library (mainly developed by Peter Amstutz) is used to load tool files ending with
.json
or.cwl
and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool.When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object.
As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc....
Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs.
Currently all
File
outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done.Implementation Links:
Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not entirely clear. To see the original ideas behind individual features - here are some relevant links:
How to test the changes?
(Select all options that apply)
License