Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added hub validation #10

Merged
merged 1 commit into from
Sep 25, 2024
Merged

added hub validation #10

merged 1 commit into from
Sep 25, 2024

Conversation

matthewcornell
Copy link
Collaborator

This PR implements the checks documented in the "Assumptions/limitations" README.md section.

The main code changes:

  • _validate_hub_ptc_compatibility(): new function that validates the hub's files
  • _validate_predtimechart_config(): added validation that goes beyond the JSON Schema's validate() call

The main test changes:

  • removed failing test_hub_config_complex_scenario_hub()
  • test__validate_predtimechart_config(): added new tests. double underscores in the name b/c the function under test starts with an underscore
  • test__validate_hub_ptc_compatibility(): new. same note re: double underscores

Additionally:

  • renamed schema.py -> ptc_schema.py
  • removed default 'r' arg to open()
  • added tests/hubs/invalid-ptc-config-hub/hub-config/tasks.json so it would pass new checks
  • ditto for tests/hubs/example-complex-scenario-hub/hub-config/predtimechart-config.yml

…ks.json, and hub model-metadata-schema.json. renamed schema.py -> ptc_schema.py. removed failing test_hub_config_complex_scenario_hub(). removed default 'r' arg to open(). added tests/hubs/invalid-ptc-config-hub/hub-config/tasks.json so it would pass new checks. ditto for tests/hubs/example-complex-scenario-hub/hub-config/predtimechart-config.yml
Copy link
Contributor

@bsweger bsweger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this coming along.

I have long-term concerns about putting in some guardrails against how this code might be affected by an upstream schema change, but since predtime chart is only designed for a very specific type of task, we're probably ok for the near future.

The the change I do think we should make is bunching together error messages in the validations, so people don't have to make one change at a time to get this working (i.e., they get a list of everything that didn't validate so they can fix it all up)

@@ -8,7 +8,7 @@
from jsonschema import validate
from jsonschema.exceptions import ValidationError

from hub_predtimechart.schema import ptc_config_schema
from hub_predtimechart.ptc_schema import ptc_config_schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this name change--explicitly distinguishing between various schemas alleviates some prior confusion!

@@ -31,41 +31,47 @@ def __init__(self, hub_dir: Path, ptc_config_file: Path):

self.hub_dir = hub_dir

# get tasks.json file content
with open(self.hub_dir / 'hub-config' / 'tasks.json') as fp:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a today thing, but some of the code below made me realize we should think about how to use or port some of the hub schema reading and parsing functions in packages like hubAdmin and hubUtil.

If the Hubverse ever makes a breaking change to the tasks.json schema, any code based on assumptions of its structure could break.

For example:

model_tasks_ele = tasks['rounds'][self.rounds_idx]['model_tasks'][self.model_tasks_idx]

# validate: only `model_tasks` groups with `quantile` output_types will be considered
output_type = the_model_task['output_type']
if 'quantile' not in output_type:
raise ValidationError(f"no quantile output_type found. found types: {list(output_type.keys())}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple validation errors (e.g., missing quantile levels and not a designated model), it would be user-friendly to return all of them at once (instead of raising an error on the first failure).

raise ValidationError(f"more than one unique `target_metadata` key found: {target_keys_keys}")


def _validate_predtimechart_config(ptc_config: dict, tasks: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note here about returning a holistic list of errors so people don't have to solve them one by one.

with open('tests/hubs/example-complex-forecast-hub/hub-config/predtimechart-config.yml') as fp:
valid_ptc_config = yaml.safe_load(fp)
ecfh_ptc_config = yaml.safe_load(fp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious....what is ecfh?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example-complex-forecast-hub

@matthewcornell
Copy link
Collaborator Author

matthewcornell commented Sep 24, 2024

Thanks for your comments, Becky. I created this issue in response to them: #11

@matthewcornell matthewcornell merged commit c7afbda into main Sep 25, 2024
1 check passed
@matthewcornell matthewcornell deleted the validation branch September 25, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants