Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a subset of the Common Workflow Language #12909

Draft
wants to merge 54 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
61508ad
[WIP] Implement records - heterogenous dataset collections.
jmchilton May 18, 2020
f34853b
Deal with workflow definitions without position fields.
jmchilton Nov 18, 2019
335209b
Implement subset of the Common Workflow Language tool and workflow fo…
jmchilton Nov 5, 2020
591a8ff
CWL Testing and Runner Improvements.
jmchilton Nov 6, 2018
93f2000
Swap default for beta formats (do not include in Galaxy).
jmchilton Nov 20, 2019
1552683
WIP: Work toward Galaxy-flavored CWL tools.
jmchilton Apr 20, 2018
be2f781
[WIP] Implement client UI for field parameter type for CWL.
jmchilton Nov 17, 2019
78c2a2d
WORKAROUND TO GET TAR TO DIRECTORY WORKING AGAIN.
jmchilton Nov 7, 2020
bfeb45a
Fix non_data_connection workflows
mvdbeek Nov 9, 2021
33f9a2d
Fix handling of uploaded_file_name
mvdbeek Nov 10, 2021
2d4dd6a
Fix directory location tests
mvdbeek Nov 10, 2021
551ead7
start documenting state of CWL support
mr-c Nov 11, 2021
53acdd9
Add sentinel value workaround for GALAXY_SLOTS hack
mvdbeek Nov 11, 2021
3eb80fe
Assert length of input connections, instead of inputs when disconnect…
mvdbeek Nov 12, 2021
7a3c003
Fix type hints
mr-c Nov 13, 2021
97d126d
Disable cheetah in configfiles, env vars for cwl tools
mvdbeek Dec 6, 2021
dfeae27
Drop test_deserialize_cwl_tool, already testing that more accurately …
mvdbeek Dec 6, 2021
0a45740
Fix wrong resolution of Any type when re-using CWL tools
mvdbeek Dec 7, 2021
9720327
Coerce discovered optional files to data
mvdbeek Dec 7, 2021
d7fd838
Fix complex types via record collection type
mvdbeek Dec 8, 2021
e9b1c81
Fix handle_known_output for nested output records
mvdbeek Dec 8, 2021
8a3c3a6
Skip staging inputs for outputs
mvdbeek Dec 8, 2021
d6ee2ea
Fix packed document support if main/#main is tool instead of workflow
mvdbeek Dec 8, 2021
9cb2338
Fix tool-provided metadata for CONVERTER_tar_to_directory
nsoranzo Dec 9, 2021
9fbcfd3
Implement rough mapping between EDAM formats and datatypes
mvdbeek Dec 9, 2021
71084f4
Support uploading directory literals
mvdbeek Dec 10, 2021
6f0ee06
Keep directory parameters in job parameters
mvdbeek Dec 11, 2021
a611751
Merge subworkflow input logic?
mvdbeek Sep 4, 2023
d1aa011
Drop divergent to_cwl/from_cwl, factor out extra_step_state building
mvdbeek Sep 5, 2023
d529cc6
TreeDict fix
mvdbeek Sep 5, 2023
c9cc2c2
Use regular staging for CWL tests instead of allow_path_paste, which …
mvdbeek Sep 5, 2023
857ba11
Fix directory uploads
mvdbeek Sep 6, 2023
9181616
Record unnamed_outputs as job outputs, wait for job outputs in stagin…
mvdbeek Sep 6, 2023
178a8db
Download complex outputs
mvdbeek Sep 25, 2023
364745e
Download secondary files as well
mvdbeek Sep 25, 2023
3f5ba94
Implement downloading directory archive
mvdbeek Oct 30, 2023
df72b06
Quickfix for moving away tool working directory
mvdbeek Oct 30, 2023
489ad97
Various fixes for stricter cwltool and cwltest
mvdbeek Oct 31, 2023
5dfb812
Fix up ontology to datatype mapping for __FETCH_DATA__
mvdbeek Oct 31, 2023
a22d865
Shortcut param_dict building for CWL tools
mvdbeek Oct 31, 2023
d707ada
WIP: untar directory to extra_files_path
mvdbeek Nov 1, 2023
7bcc835
Add test for workflow default file overrides tool default file
mvdbeek Nov 3, 2023
501a292
WIP:CWL default file value_from work
mvdbeek Nov 4, 2023
3d8a26b
Into split trans to app
mvdbeek Nov 5, 2023
418f14e
Separate and fix value_from overriding default
mvdbeek Nov 5, 2023
da8beaa
Ensure that expression tool null values are treated as null values wh…
mvdbeek Nov 5, 2023
3622fb0
Hack: default files in FieldTypeToolParameter
mvdbeek Nov 5, 2023
a963b6c
Fix literal value if field type
mvdbeek Nov 5, 2023
cea756b
Replace file location with URL ...
mvdbeek Nov 5, 2023
44a6465
Pack workflow
mvdbeek Nov 5, 2023
4f30399
Update list of new failing 1.2 tests
mvdbeek Nov 6, 2023
0c87ef5
Drop now passing red tests
mvdbeek Nov 6, 2023
2038879
Exclude red and required 1.0 tests from github matrix
mvdbeek Nov 6, 2023
09c2fef
Fix output addition to history if input name is same as output name
mvdbeek Nov 7, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .github/workflows/cwl_conformance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,17 @@ concurrency:
jobs:
test:
name: Test
if: ${{ false }}
runs-on: ubuntu-latest
continue-on-error: ${{ startsWith(matrix.marker, 'red') }}
strategy:
fail-fast: false
matrix:
python-version: ['3.8']
marker: ['green', 'red and required', 'red and not required']
conformance-version: ['cwl_conformance_v1_0'] #, 'cwl_conformance_v1_1', 'cwl_conformance_v1_2']
conformance-version: ['cwl_conformance_v1_0', 'cwl_conformance_v1_1', 'cwl_conformance_v1_2']
exclude:
- marker: red and required
conformance-version: cwl_conformance_v1_0
services:
postgres:
image: postgres:13
Expand Down
1 change: 1 addition & 0 deletions client/src/api/datasets.ts
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ export async function copyDataset(
// TODO: Investigate. These should be optional, but the API requires explicit null values?
type,
copy_elements: null,
fields: null,
hide_source_items: null,
instance_type: null,
},
Expand Down
19 changes: 19 additions & 0 deletions client/src/api/schema/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6948,6 +6948,12 @@ export interface components {
* @description List of elements that should be in the new collection.
*/
element_identifiers?: components["schemas"]["CollectionElementIdentifier"][] | null;
/**
* Fields
* @description List of fields to create for this collection. Set to 'auto' to guess fields from identifiers.
* @default []
*/
fields: string | components["schemas"]["FieldDict"][] | null;
/**
* Folder Id
* @description The ID of the library folder that will contain the collection. Required if `instance_type=library`.
Expand Down Expand Up @@ -7140,6 +7146,12 @@ export interface components {
* @description List of elements that should be in the new collection.
*/
element_identifiers?: components["schemas"]["CollectionElementIdentifier"][] | null;
/**
* Fields
* @description List of fields to create for this collection. Set to 'auto' to guess fields from identifiers.
* @default []
*/
fields: string | components["schemas"]["FieldDict"][] | null;
/**
* Folder Id
* @description The ID of the library folder that will contain the collection. Required if `instance_type=library`.
Expand Down Expand Up @@ -9080,6 +9092,13 @@ export interface components {
/** Hash Value */
hash_value: string;
};
/** FieldDict */
FieldDict: {
/** Name */
name: string;
/** Type */
type: string;
};
/** FileDataElement */
FileDataElement: {
/** Md5 */
Expand Down
1 change: 1 addition & 0 deletions client/src/components/History/model/queries.ts
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ export async function createDatasetCollection(history: HistorySummary, inputs =
copy_elements: true,
name: "list",
element_identifiers: [],
fields: "auto",
hide_source_items: true,
};
const payload = Object.assign({}, defaults, inputs);
Expand Down
24 changes: 24 additions & 0 deletions doc/source/dev/cwl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
CWL import in Galaxy
====================

nsoranzo marked this conversation as resolved.
Show resolved Hide resolved
What is supported
-----------------

What is not supported
---------------------

Some CWL Expressions / Parameter references that do math on `$(resources.cores)`
or similar will likely not work.

How to enable it?
-----------------

1. List paths to CWL tools in `tool_conf.xml` .
2. Set the following in `galaxy.yml`:

```yaml
enable_beta_tool_formats: true
enable_beta_workflow_modules: true
check_upload_content: false
strict_cwl_validation: false
```
3 changes: 3 additions & 0 deletions lib/galaxy/config/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -933,6 +933,9 @@ def _process_config(self, kwargs: Dict[str, Any]) -> None:
else None
)

# TODO: migrate to schema.
# Should CWL artifacts be loaded with strict validation enabled.
self.strict_cwl_validation = string_as_bool(kwargs.get("strict_cwl_validation", "True"))
# These are not even beta - just experiments - don't use them unless
# you want yours tools to be broken in the future.
self.enable_beta_tool_formats = string_as_bool(kwargs.get("enable_beta_tool_formats", "False"))
Expand Down
2 changes: 1 addition & 1 deletion lib/galaxy/config/sample/datatypes_conf.xml.sample
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@
<datatype extension="tar" auto_compressed_types="gz,bz2" type="galaxy.datatypes.binary:CompressedArchive" subclass="true" display_in_upload="true">
<converter file="archive_to_directory.xml" target_datatype="directory"/>
</datatype>
<datatype extension="directory" type="galaxy.datatypes.data:Directory"/>
<datatype extension="directory" type="galaxy.datatypes.data:Directory" display_in_upload="true"/>
<datatype extension="zarr" type="galaxy.datatypes.data:ZarrDirectory" />
<datatype extension="ome_zarr" type="galaxy.datatypes.images:OMEZarr" />
<datatype extension="yaml" type="galaxy.datatypes.text:Yaml" display_in_upload="true" />
Expand Down
23 changes: 14 additions & 9 deletions lib/galaxy/datatypes/converters/tar_to_directory.xml
Original file line number Diff line number Diff line change
@@ -1,25 +1,30 @@
<tool id="CONVERTER_tar_to_directory" name="Convert tar to directory" version="1.0.1" profile="17.05">
<tool id="CONVERTER_tar_to_directory" name="Convert tar to directory" version="1.0.1" profile="21.09">
<!-- Don't use tar directly so we can verify safety of results - tar -xzf '$input1'; -->
<requirements>
<requirement type="package" version="23.2.1">galaxy-util</requirement>
</requirements>
<command>
mkdir '$output1.files_path';
cd '$output1.files_path';
python -c "from galaxy.util.compression_utils import CompressedFile; CompressedFile('$input1').extract('.');"
</command>
<command detect_errors="exit_code"><![CDATA[
cp '$provided_metadata' 'galaxy.json' &&
mkdir '$output1.files_path' &&
cd '$output1.files_path' &&
python -c "from galaxy.util.compression_utils import CompressedFile; CompressedFile('$input1').extract('.');"
]]></command>
<configfiles>
<configfile name="provided_metadata">{"output1": {"created_from_basename": "${input1.created_from_basename}"}}
</configfile>
</configfiles>
<inputs>
<param format="tar" name="input1" type="data"/>
</inputs>
<outputs>
<data format="directory" name="output1"/>
<data format="directory" name="output1" metadata_source="input1" />
</outputs>
<tests>
<test>
<param name="input1" ftype="tar" value="testdir1.tar"/>
<output name="output1" ftype="directory" value="testdir1.tar.directory"/>
</test>
</tests>
<help>
</help>
<help><![CDATA[
]]></help>
</tool>
30 changes: 29 additions & 1 deletion lib/galaxy/datatypes/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@
self.config = config
self.edam = edam
self.datatypes_by_extension: Dict[str, Data] = {}
self.datatypes_by_format = {}
self.datatypes_by_suffix_inferences = {}
self.mimetypes_by_extension = {}
self.datatype_converters = {}
Expand Down Expand Up @@ -269,13 +270,25 @@
upload_warning_template = Template(upload_warning_el.text or "")
datatype_instance = datatype_class()
self.datatypes_by_extension[extension] = datatype_instance
if not datatype_class.is_subclass:
edam_format = datatype_class.edam_format
prefixed_format = f"edam:{edam_format}"
if prefixed_format not in self.datatypes_by_format:
register_datatype_by_format = True
for super_klass in datatype_class.__mro__[1:-1]:
super_edam_format = getattr(super_klass, "edam_format", None)
if super_edam_format == edam_format:
register_datatype_by_format = False
break
if register_datatype_by_format:
self.datatypes_by_format[prefixed_format] = datatype_instance
if mimetype is None:
# Use default mimetype per datatype specification.
mimetype = self.datatypes_by_extension[extension].get_mime()
self.mimetypes_by_extension[extension] = mimetype
if datatype_class.track_type:
self.available_tracks.append(extension)
if display_in_upload and extension not in self.upload_file_formats:
if display_in_upload:
self.upload_file_formats.append(extension)
# Max file size cut off for setting optional metadata.
self.datatypes_by_extension[extension].max_optional_metadata_filesize = elem.get(
Expand Down Expand Up @@ -413,6 +426,7 @@
override=override,
compressed_sniffers=compressed_sniffers,
)
self.upload_file_formats = list(set(self.upload_file_formats))
self.upload_file_formats.sort()
# Load build sites
if use_build_sites:
Expand Down Expand Up @@ -613,6 +627,20 @@
"""Returns a datatype object based on an extension"""
return self.datatypes_by_extension.get(ext, None)

def get_datatype_by_format_ontology(self, ontology: str):
"""Returns a datatype by format ontology"""
if "edamontology.org/" in ontology:

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
edamontology.org/
may be at an arbitrary position in the sanitized URL.
ontology = f"edam:{ontology.split('edamontology.org/')[1]}"
return self.datatypes_by_format.get(ontology)

def get_datatype_ext_by_format_ontology(self, ontology: str, only_uploadable: bool = False) -> Optional[str]:
"""Returns a datatype by format ontology"""
datatype = self.get_datatype_by_format_ontology(ontology)
if datatype:
if not only_uploadable or datatype.file_ext in self.upload_file_formats:
return datatype.file_ext
return None

def change_datatype(self, data, ext):
if data.extension != ext:
data.extension = ext
Expand Down
7 changes: 4 additions & 3 deletions lib/galaxy/jobs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1146,7 +1146,7 @@ def can_split(self):

@property
def is_cwl_job(self):
return self.tool.tool_type == "cwl"
return self.tool.tool_type in ["galactic_cwl", "cwl"]

def get_job_runner_url(self):
log.warning(f"({self.job_id}) Job runner URLs are deprecated, use destinations instead.")
Expand Down Expand Up @@ -1778,8 +1778,9 @@ def _finish_dataset(
dataset.mark_unhidden()
elif not purged:
# If the tool was expected to set the extension, attempt to retrieve it
if dataset.ext == "auto":
dataset.extension = context.get("ext", "data")
context_ext = context.get("ext", "data")
if dataset.ext == "auto" or (dataset.ext == "data" and context_ext != "data"):
dataset.extension = context_ext
dataset.init_meta(copy_from=dataset)
# if a dataset was copied, it won't appear in our dictionary:
# either use the metadata from originating output dataset, or call set_meta on the copies
Expand Down
35 changes: 24 additions & 11 deletions lib/galaxy/jobs/command_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,19 +100,26 @@ def build_command(
external_command_shell = container.shell
else:
external_command_shell = shell
externalized_commands = __externalize_commands(
job_wrapper, external_command_shell, commands_builder, remote_command_params, container=container
)
if container and modify_command_for_container:
# Stop now and build command before handling metadata and copying
# working directory files back. These should always happen outside
# of docker container - no security implications when generating
# metadata and means no need for Galaxy to be available to container
# and not copying workdir outputs back means on can be more restrictive
# of where container can write to in some circumstances.
run_in_container_command = container.containerize_command(externalized_commands)
if job_wrapper.tool and not job_wrapper.tool.may_use_container_entry_point:
externalized_commands = __externalize_commands(
job_wrapper, external_command_shell, commands_builder, remote_command_params, container=container
)
# Stop now and build command before handling metadata and copying
# working directory files back. These should always happen outside
# of docker container - no security implications when generating
# metadata and means no need for Galaxy to be available to container
# and not copying workdir outputs back means on can be more restrictive
# of where container can write to in some circumstances.
run_in_container_command = container.containerize_command(externalized_commands)
else:
tool_commands = commands_builder.build()
run_in_container_command = container.containerize_command(tool_commands)
commands_builder = CommandsBuilder(run_in_container_command)
else:
externalized_commands = __externalize_commands(
job_wrapper, external_command_shell, commands_builder, remote_command_params, container=container
)
commands_builder = CommandsBuilder(externalized_commands)

# Galaxy writes I/O files to outputs, Pulsar uses metadata. metadata seems like
Expand All @@ -130,7 +137,13 @@ def build_command(

# Copy working and outputs before job submission so that these can be restored on resubmission
# xref https://github.com/galaxyproject/galaxy/issues/3289
commands_builder.prepend_command(PREPARE_DIRS)
if not job_wrapper.is_cwl_job:
commands_builder.prepend_command(PREPARE_DIRS)
else:
# Can't do the rm -rf working for CWL jobs since we may have staged outputs
# into that directory. This does mean CWL is incompatible with job manager triggered
# retries - what can we do with that information?
commands_builder.prepend_command("mkdir -p outputs; cd working")

__handle_remote_command_line_building(commands_builder, job_wrapper, for_pulsar=for_pulsar)

Expand Down
12 changes: 11 additions & 1 deletion lib/galaxy/jobs/runners/local.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

import datetime
import logging
import math
import os
import subprocess
import tempfile
Expand Down Expand Up @@ -67,7 +68,16 @@ def _command_line(self, job_wrapper: "MinimalJobWrapper") -> Tuple[str, str]:
if slots:
slots_statement = f'GALAXY_SLOTS="{int(slots)}"; export GALAXY_SLOTS; GALAXY_SLOTS_CONFIGURED="1"; export GALAXY_SLOTS_CONFIGURED;'
else:
slots_statement = 'GALAXY_SLOTS="1"; export GALAXY_SLOTS;'
cores_min = 1
if job_wrapper.tool:
try:
# In CWL 1.2 it can be a float that can be rounded to the next whole number
cores_min = math.ceil(float(job_wrapper.tool.cores_min))
except ValueError:
# TODO: in CWL this can be an expression referencing runtime
# parameters, e.g. `$(inputs.special_file.size)`
pass
slots_statement = f'GALAXY_SLOTS="{cores_min}"; export GALAXY_SLOTS;'

job_id = job_wrapper.get_id_tag()
job_file = JobState.default_job_file(job_wrapper.working_directory, job_id)
Expand Down
17 changes: 14 additions & 3 deletions lib/galaxy/managers/collections.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ def create(
flush=True,
completed_job=None,
output_name=None,
fields=None,
):
"""
PRECONDITION: security checks on ability to add to parent
Expand All @@ -199,6 +200,7 @@ def create(
hide_source_items=hide_source_items,
copy_elements=copy_elements,
history=history,
fields=fields,
)

implicit_inputs = []
Expand Down Expand Up @@ -242,8 +244,11 @@ def _create_instance_for_collection(
name=name,
)
assert isinstance(dataset_collection_instance, model.HistoryDatasetCollectionAssociation)

if implicit_inputs:
for input_name, input_collection in implicit_inputs:
if getattr(input_collection, "ephemeral", False):
input_collection = input_collection.persistent_object
dataset_collection_instance.add_implicit_input_collection(input_name, input_collection)

if implicit_output_name:
Expand Down Expand Up @@ -285,17 +290,20 @@ def create_dataset_collection(
hide_source_items=None,
copy_elements=False,
history=None,
fields=None,
):
# Make sure at least one of these is None.
assert element_identifiers is None or elements is None

if element_identifiers is None and elements is None:
raise RequestParameterInvalidException(ERROR_INVALID_ELEMENTS_SPECIFICATION)
if not collection_type:
raise RequestParameterInvalidException(ERROR_NO_COLLECTION_TYPE)

collection_type_description = self.collection_type_descriptions.for_collection_type(collection_type)
collection_type_description = self.collection_type_descriptions.for_collection_type(
collection_type, fields=fields
)
has_subcollections = collection_type_description.has_subcollections()

# If we have elements, this is an internal request, don't need to load
# objects from identifiers.
if elements is None:
Expand All @@ -319,8 +327,9 @@ def create_dataset_collection(

if elements is not self.ELEMENTS_UNINITIALIZED:
type_plugin = collection_type_description.rank_type_plugin()
dataset_collection = builder.build_collection(type_plugin, elements)
dataset_collection = builder.build_collection(type_plugin, elements, fields=fields)
else:
# TODO: Pass fields here - need test case first.
dataset_collection = model.DatasetCollection(populated=False)
dataset_collection.collection_type = collection_type
return dataset_collection
Expand Down Expand Up @@ -400,6 +409,8 @@ def _append_tags(self, dataset_collection_instance, implicit_inputs=None, tags=N
tags = tags or {}
implicit_inputs = implicit_inputs or []
for _, v in implicit_inputs:
if getattr(v, "ephemeral", False):
v = v.persistent_object
for tag in v.auto_propagated_tags:
tags[tag.value] = tag
for _, tag in tags.items():
Expand Down
Loading
Loading