added hub validation #10

matthewcornell · 2024-09-24T17:10:36Z

This PR implements the checks documented in the "Assumptions/limitations" README.md section.

The main code changes:

_validate_hub_ptc_compatibility(): new function that validates the hub's files
_validate_predtimechart_config(): added validation that goes beyond the JSON Schema's validate() call

The main test changes:

removed failing test_hub_config_complex_scenario_hub()
test__validate_predtimechart_config(): added new tests. double underscores in the name b/c the function under test starts with an underscore
test__validate_hub_ptc_compatibility(): new. same note re: double underscores

Additionally:

renamed schema.py -> ptc_schema.py
removed default 'r' arg to open()
added tests/hubs/invalid-ptc-config-hub/hub-config/tasks.json so it would pass new checks
ditto for tests/hubs/example-complex-scenario-hub/hub-config/predtimechart-config.yml

…ks.json, and hub model-metadata-schema.json. renamed schema.py -> ptc_schema.py. removed failing test_hub_config_complex_scenario_hub(). removed default 'r' arg to open(). added tests/hubs/invalid-ptc-config-hub/hub-config/tasks.json so it would pass new checks. ditto for tests/hubs/example-complex-scenario-hub/hub-config/predtimechart-config.yml

bsweger

Great to see this coming along.

I have long-term concerns about putting in some guardrails against how this code might be affected by an upstream schema change, but since predtime chart is only designed for a very specific type of task, we're probably ok for the near future.

The the change I do think we should make is bunching together error messages in the validations, so people don't have to make one change at a time to get this working (i.e., they get a list of everything that didn't validate so they can fix it all up)

bsweger · 2024-09-24T17:48:37Z

src/hub_predtimechart/hub_config.py

@@ -8,7 +8,7 @@
 from jsonschema import validate
 from jsonschema.exceptions import ValidationError

-from hub_predtimechart.schema import ptc_config_schema
+from hub_predtimechart.ptc_schema import ptc_config_schema


Thanks for this name change--explicitly distinguishing between various schemas alleviates some prior confusion!

bsweger · 2024-09-24T18:09:10Z

src/hub_predtimechart/hub_config.py

@@ -31,41 +31,47 @@ def __init__(self, hub_dir: Path, ptc_config_file: Path):

        self.hub_dir = hub_dir

+        # get tasks.json file content
+        with open(self.hub_dir / 'hub-config' / 'tasks.json') as fp:


Not a today thing, but some of the code below made me realize we should think about how to use or port some of the hub schema reading and parsing functions in packages like hubAdmin and hubUtil.

If the Hubverse ever makes a breaking change to the tasks.json schema, any code based on assumptions of its structure could break.

For example:

model_tasks_ele = tasks['rounds'][self.rounds_idx]['model_tasks'][self.model_tasks_idx]

bsweger · 2024-09-24T18:16:11Z

src/hub_predtimechart/hub_config.py

+    # validate: only `model_tasks` groups with `quantile` output_types will be considered
+    output_type = the_model_task['output_type']
+    if 'quantile' not in output_type:
+        raise ValidationError(f"no quantile output_type found. found types: {list(output_type.keys())}")


If there are multiple validation errors (e.g., missing quantile levels and not a designated model), it would be user-friendly to return all of them at once (instead of raising an error on the first failure).

bsweger · 2024-09-24T18:20:29Z

src/hub_predtimechart/hub_config.py

+        raise ValidationError(f"more than one unique `target_metadata` key found: {target_keys_keys}")
+
+
+def _validate_predtimechart_config(ptc_config: dict, tasks: dict):


Same note here about returning a holistic list of errors so people don't have to solve them one by one.

bsweger · 2024-09-24T18:22:23Z

tests/hub_predtimechart/test_hub_config.py

    with open('tests/hubs/example-complex-forecast-hub/hub-config/predtimechart-config.yml') as fp:
-        valid_ptc_config = yaml.safe_load(fp)
+        ecfh_ptc_config = yaml.safe_load(fp)


Just curious....what is ecfh?

example-complex-forecast-hub

matthewcornell · 2024-09-24T20:33:12Z

Thanks for your comments, Becky. I created this issue in response to them: #11

bsweger requested changes Sep 24, 2024

View reviewed changes

matthewcornell mentioned this pull request Sep 24, 2024

collect all errors before exiting validation #11

Open

bsweger approved these changes Sep 25, 2024

View reviewed changes

matthewcornell merged commit c7afbda into main Sep 25, 2024
1 check passed

matthewcornell deleted the validation branch September 25, 2024 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added hub validation #10

added hub validation #10

matthewcornell commented Sep 24, 2024

bsweger left a comment

bsweger Sep 24, 2024

bsweger Sep 24, 2024

bsweger Sep 24, 2024

bsweger Sep 24, 2024

bsweger Sep 24, 2024

matthewcornell Sep 24, 2024

matthewcornell commented Sep 24, 2024 •

edited

Loading

		raise ValidationError(f"more than one unique `target_metadata` key found: {target_keys_keys}")


		def _validate_predtimechart_config(ptc_config: dict, tasks: dict):

added hub validation #10

added hub validation #10

Conversation

matthewcornell commented Sep 24, 2024

bsweger left a comment

Choose a reason for hiding this comment

bsweger Sep 24, 2024

Choose a reason for hiding this comment

bsweger Sep 24, 2024

Choose a reason for hiding this comment

bsweger Sep 24, 2024

Choose a reason for hiding this comment

bsweger Sep 24, 2024

Choose a reason for hiding this comment

bsweger Sep 24, 2024

Choose a reason for hiding this comment

matthewcornell Sep 24, 2024

Choose a reason for hiding this comment

matthewcornell commented Sep 24, 2024 • edited Loading

matthewcornell commented Sep 24, 2024 •

edited

Loading