Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRT-571 - Update regression tests for net2cog. #131

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

owenlittlejohns
Copy link
Member

@owenlittlejohns owenlittlejohns commented Feb 3, 2025

Description

This PR updates the net2cog regression test suite to cover two cases for multiple variables:

  • Multiple variables explicitly requested.
  • "all" variables requested.

To enable multiple variables (due to the need for multiple UMM-Var records), I cloned the SMAP collection already being used into the EEDTEST provider in UAT and the tests now use those files.

The corresponding changes to the services are in podaac/net2cog#38.

Notes: The test data use git lfs, so when running this notebook locally you will need to grab those files using git lfs.

Jira Issue ID

This work is in support of TRT-571, and addresses GitHub issue podaac/net2cog#32, and part of podaac/net2cog#35.

Local Test Steps

  • Pull the branch in this PR.
  • Pull the branch containing the updated net2cog service (see this branch: TRT-571 - Update net2cog to process multiple variables. podaac/net2cog#38)
  • Build the service image following instructions in that repository.
  • Restart Harmony in a Box, specifying net2cog in the locally deployed services in the .env file.
  • Create a conda environment for this notebook (conda env create -f ./environment.yaml).
  • Activate that conda environment (conda activate papermill-net2cog).
  • Fire up a Jupyter notebook server and open the net2cog regression test notebook.
  • Update the Harmony environment to be local (harmony_host_url = 'http://localhost:3000'). The notebook is set to UAT as a default.
  • Run the notebook. It should be successful. 🤞

PR Acceptance Checklist

  • Acceptance criteria met
  • Tests added/updated (if needed) and passing
  • Documentation updated (if needed)
  • CHANGELOG updated with the changes for this PR

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to pull and build the service image for net2cog from the branch in the instructions, I spun up Harmony-In-A-Box and ran this notebook against it and all of the tests passed.

Only one comment about expected metadata looking like it's coming from the actual data. You could make the small change suggested, but I don't have strong opinions.

print(f'Assessing: {basename(cog_file)}')
validate_cog(cog_file)

expected_metadata = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actual metadata isn't it? if those are downloaded_cog_outputs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should go down below the reference_file and get the dtype and nodata from the reference file. Because when you check in assert_dataset_produced_correct_results you're just checking against the file you got the expected from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just looked at the functions for get_expected_smap_dtype and get_expected_smap_nodata, so this is really not a big deal, since it's reading the filename not any actual data. But it is a little confusing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so they are just looking at the file name to determine the variable that is in it and set the right values. But it's a great point - the better thing to do would be to pull the values from the reference file, rather than make inferences based on the file name. I'll take a look at that now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we go: 7bc51b

This feels much cleaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants