Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor]: CDAT Migration: cdms2.open slice flag adds additional time coordinate which affects downstream time series operations and diffs with Xarray code #759

Closed
2 tasks
tomvothecoder opened this issue Dec 5, 2023 · 6 comments
Assignees
Labels
cdat-migration-fy24 CDAT Migration FY24 Task

Comments

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Dec 5, 2023

Is your feature request related to a problem?

In the CDAT version of the Dataset class, cdms2.open is being called with a slice flag (either "co"/"ccb"). This can cause result in major differences between the refactored Xarray code and the CDAT code

Related lines of code on main:

In #754, "co" is being set because time coordinates start at the beginning of the month. This adds an additional time coordinate point to the end of the time series file, which affects the subsequent climatology and spatial averaging calculations. I found omitting the extra time coordinate point before calculating the climatology and spatial averaging results in identical results to the xCDAT version.

Describe the solution you'd like

Should we consider a feature on the slice flag? I found almost no documentation on how to use this slice flag.

TODO

  • 1. Find out what "co" and "ccb" do when opening datasets
  • 2. Implement a function that mimics these two cases

Describe alternatives you've considered

No response

Additional context

  • Documentation: https://cdms.readthedocs.io/en/latest/manual/cdms_2.html#axis-methods-additional-to-coordinateaxis-cont-d
    • [left, right, indicator, cycle] -> e.g,. "co", "ccb"
    • "c" - interval is closed
    • "o" - interval is open
    • 'b' -select axis elements for which the corresponding cell boundary intersects the interval
    • Default is "ccn"
      "
  • The first 2 letters represents the bounds of the retrievd segment, they can be "c" or "o"
  • The third letter represents the search method, it can be "b" ,"n", "e", or "s"
    • The cell will be considered valid if the bounds ("b") or ("n") node are within the interval defined
@tomvothecoder tomvothecoder changed the title [Feature]: cdms2.open slice flag adds additional time coordinate which affects downstream metrics [Feature]: cdms2.open slice flag adds additional time coordinate which affects downstream operation and diffs with Xarray code Dec 5, 2023
@chengzhuzhang
Copy link
Contributor

chengzhuzhang commented Dec 5, 2023

Nice finding...I tried a few times to find docs on those slice options, and only found them through a xcdat discussion by @jypeter. The link to the documentation is here:https://cdms.readthedocs.io/en/latest/manual/cdms_2.html#axis-methods-additional-to-coordinateaxis-cont-d

I thought there was discussion in xcdat to consider this slicing feature, but not sure if there was a consensus.

@tomvothecoder
Copy link
Collaborator Author

Thanks @chengzhuzhang! I should have documented this when I ran into it while implementing the new Xarray climatology function in April. It came back to haunt me.

The good news is that I confirmed the slice flag is what is producing the large differences for the specific variables in #658 (specifically NET_FLUX_SRF and RESTOM spatial averages). Here is my validation notebook with those findings: https://github.com/E3SM-Project/e3sm_diags/blob/cdat-migration-fy24/auxiliary_tools/cdat_regression_testing/671-lat-lon/12-4-23-qa-no-cdms-slice.ipynb.

I re-ran main without the slice flag and found results are basically identical! So there is no bug that I'm aware of in the cdat-migration-fy24 branch so far.

@tomvothecoder
Copy link
Collaborator Author

I thought there was discussion in xcdat to consider this slicing feature, but not sure if there was a consensus.

Yeah we had a discussion but it fell off the radar. If we want to produce nearly 100% identical results with the CDAT version of e3sm_diags, we should consider revisiting this topic.

@jypeter
Copy link

jypeter commented Dec 5, 2023

@chengzhuzhang the document you probably want about the slices, is page 9 of Unlocking_CDATs_Best_Kept_Secrets.pdf, available through xCDAT/xcdat#170. The option is probably also described in cdms5.pdf, but I think the figure in the Secrets document is better

@tomvothecoder tomvothecoder self-assigned this Jan 18, 2024
@tomvothecoder tomvothecoder added the cdat-migration-fy24 CDAT Migration FY24 Task label Jan 18, 2024
@tomvothecoder tomvothecoder changed the title [Feature]: cdms2.open slice flag adds additional time coordinate which affects downstream operation and diffs with Xarray code [Refactor]: CDAT Migration: cdms2.open slice flag adds additional time coordinate which affects downstream operation and diffs with Xarray code Jan 18, 2024
@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Jan 18, 2024

At our meeting on 1/16, we thought of three options:

  1. Implement this feature in e3sm_diags
    • Optionally, port over this feature in xCDAT in the future (if there is user desire, demand)
  2. Implement this feature in xCDAT
    • Seems like a low priority feature request and xCDAT follows its own development cycle
  3. Update code on main to remove slice flag operation

For this GitHub issue, we decided to go with 1. as a near-term solution.

@tomvothecoder tomvothecoder changed the title [Refactor]: CDAT Migration: cdms2.open slice flag adds additional time coordinate which affects downstream operation and diffs with Xarray code [Refactor]: CDAT Migration: cdms2.open slice flag adds additional time coordinate which affects downstream time series operations and diffs with Xarray code May 14, 2024
@tomvothecoder
Copy link
Collaborator Author

Closed by #750 and #815

tomvothecoder added a commit that referenced this issue Dec 5, 2024
Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754)

Update regression test notebook to show validation of all vars

Add `subset_and_align_datasets()` to regrid.py (#776)

Add template run scripts

CDAT Migration Phase: Refactor `cosp_histogram` set (#748)

- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

Refactor 654 zonal mean xy (#752)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration - Update run script output directory to NERSC public webserver (#793)

[PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788)

CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794)

CDAT Migration: Refactor `polar` set (#749)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

Align order of calls to `_set_param_output_attrs`

CDAT Migration: Refactor `meridional_mean_2d` set (#795)

CDAT Migration: Refactor `aerosol_budget` (#800)

Add `acme.py` changes from PR #712 (#814)

* Add `acme.py` changes from PR #712

* Replace unnecessary lambda call

Refactor area_mean_time_series and add ccb slice flag feature (#750)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

[Refactor]: Validate fix in PR #750 for #759 (#815)

CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819)

CDAT Migration: Refactor annual_cycle_zonal_mean set (#798)

* Refactor `annual_cycle_zonal_mean` set

* Address PR review comments

* Add lat lon regression testing

* Add debugging scripts

* Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords
- Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

* Fix unit tests

* Remove old plotter

* Add script to debug decode_times=True and ncclimo file

* Update plotter time values to month integers

* Fix slow `.load()` and multiprocessing issue
- Due to incorrectly updating `keep_bnds` logic
- Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

* Update `_encode_time_coords()` docstring

* Add AODVIS debug script

* update AODVIS obs datasets; regression test results

---------

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `qbo` set (#826)

CDAT Migration Phase 2: Refactor tc_analysis set  (#829)

* start tc_analysis_refactor

* update driver

* update plotting

* Clean up plotter
- Remove unused variables
- Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts
- Reorder functions for top-down readability

* Remove unused notebook

---------

Co-authored-by: tomvothecoder <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `enso_diags` set (#832)

CDAT Migration Phase 2: Refactor `streamflow` set (#837)

[Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841)

[Refactor]: CDAT Migration Phase 3: testing and documentation update (#846)

CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860)

CDAT Migration Phase 2: Refactor arm_diags set (#842)

Add performance benchmark material (#864)

Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865)

CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875)

CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876)

Add support for time series datasets via glob and fix `enso_diags` set (#866)

Add fix for checking `is_time_series()` property based on `data_type` attr (#881)

CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882)

CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883)

CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887)

CDAT Migration: Prepare branch for merge to `main` (#885)

[Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891)

CDAT Migration Phase 2: Refactor core utilities and  `lat_lon` set (#677)

Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

CDAT Migration Phase 2: Refactor `qbo` set (#826)
tomvothecoder added a commit that referenced this issue Dec 5, 2024
Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754)

Update regression test notebook to show validation of all vars

Add `subset_and_align_datasets()` to regrid.py (#776)

Add template run scripts

CDAT Migration Phase: Refactor `cosp_histogram` set (#748)

- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

Refactor 654 zonal mean xy (#752)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration - Update run script output directory to NERSC public webserver (#793)

[PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788)

CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794)

CDAT Migration: Refactor `polar` set (#749)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

Align order of calls to `_set_param_output_attrs`

CDAT Migration: Refactor `meridional_mean_2d` set (#795)

CDAT Migration: Refactor `aerosol_budget` (#800)

Add `acme.py` changes from PR #712 (#814)

* Add `acme.py` changes from PR #712

* Replace unnecessary lambda call

Refactor area_mean_time_series and add ccb slice flag feature (#750)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

[Refactor]: Validate fix in PR #750 for #759 (#815)

CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819)

CDAT Migration: Refactor annual_cycle_zonal_mean set (#798)

* Refactor `annual_cycle_zonal_mean` set

* Address PR review comments

* Add lat lon regression testing

* Add debugging scripts

* Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords
- Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

* Fix unit tests

* Remove old plotter

* Add script to debug decode_times=True and ncclimo file

* Update plotter time values to month integers

* Fix slow `.load()` and multiprocessing issue
- Due to incorrectly updating `keep_bnds` logic
- Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

* Update `_encode_time_coords()` docstring

* Add AODVIS debug script

* update AODVIS obs datasets; regression test results

---------

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `qbo` set (#826)

CDAT Migration Phase 2: Refactor tc_analysis set  (#829)

* start tc_analysis_refactor

* update driver

* update plotting

* Clean up plotter
- Remove unused variables
- Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts
- Reorder functions for top-down readability

* Remove unused notebook

---------

Co-authored-by: tomvothecoder <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `enso_diags` set (#832)

CDAT Migration Phase 2: Refactor `streamflow` set (#837)

[Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841)

[Refactor]: CDAT Migration Phase 3: testing and documentation update (#846)

CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860)

CDAT Migration Phase 2: Refactor arm_diags set (#842)

Add performance benchmark material (#864)

Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865)

CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875)

CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876)

Add support for time series datasets via glob and fix `enso_diags` set (#866)

Add fix for checking `is_time_series()` property based on `data_type` attr (#881)

CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882)

CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883)

CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887)

CDAT Migration: Prepare branch for merge to `main` (#885)

[Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891)

CDAT Migration Phase 2: Refactor core utilities and  `lat_lon` set (#677)

Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

CDAT Migration Phase 2: Refactor `qbo` set (#826)
tomvothecoder added a commit that referenced this issue Jan 15, 2025
Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

Regression testing for lat_lon variables `NET_FLUX_SRF` and `RESTOM` (#754)

Update regression test notebook to show validation of all vars

Add `subset_and_align_datasets()` to regrid.py (#776)

Add template run scripts

CDAT Migration Phase: Refactor `cosp_histogram` set (#748)

- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

Refactor 654 zonal mean xy (#752)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration - Update run script output directory to NERSC public webserver (#793)

[PR]: CDAT Migration: Refactor `aerosol_aeronet` set (#788)

CDAT Migration: Test `lat_lon` set with run script and debug any issues (#794)

CDAT Migration: Refactor `polar` set (#749)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

Align order of calls to `_set_param_output_attrs`

CDAT Migration: Refactor `meridional_mean_2d` set (#795)

CDAT Migration: Refactor `aerosol_budget` (#800)

Add `acme.py` changes from PR #712 (#814)

* Add `acme.py` changes from PR #712

* Replace unnecessary lambda call

Refactor area_mean_time_series and add ccb slice flag feature (#750)

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

[Refactor]: Validate fix in PR #750 for #759 (#815)

CDAT Migration Phase 2: Refactor `diurnal_cycle` set (#819)

CDAT Migration: Refactor annual_cycle_zonal_mean set (#798)

* Refactor `annual_cycle_zonal_mean` set

* Address PR review comments

* Add lat lon regression testing

* Add debugging scripts

* Update `_open_climo_dataset()` to decode times as workaround to misaligned time coords
- Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

* Fix unit tests

* Remove old plotter

* Add script to debug decode_times=True and ncclimo file

* Update plotter time values to month integers

* Fix slow `.load()` and multiprocessing issue
- Due to incorrectly updating `keep_bnds` logic
- Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

* Update `_encode_time_coords()` docstring

* Add AODVIS debug script

* update AODVIS obs datasets; regression test results

---------

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `qbo` set (#826)

CDAT Migration Phase 2: Refactor tc_analysis set  (#829)

* start tc_analysis_refactor

* update driver

* update plotting

* Clean up plotter
- Remove unused variables
- Make `plot_info` a constant called `PLOT_INFO`, which is now a dict of dicts
- Reorder functions for top-down readability

* Remove unused notebook

---------

Co-authored-by: tomvothecoder <tomvothecoder@gmail.com>

CDAT Migration Phase 2: Refactor `enso_diags` set (#832)

CDAT Migration Phase 2: Refactor `streamflow` set (#837)

[Bug]: CDAT Migration Phase 2: enso_diags plot fixes (#841)

[Refactor]: CDAT Migration Phase 3: testing and documentation update (#846)

CDAT Migration Phase 3 - Port QBO Wavelet feature to Xarray/xCDAT codebase (#860)

CDAT Migration Phase 2: Refactor arm_diags set (#842)

Add performance benchmark material (#864)

Add function to add CF axis attr to Z axis if missing for downstream xCDAT operations (#865)

CDAT Migration Phase 3: Add Convective Precipitation Fraction in lat-lon (#875)

CDAT Migration Phase 3: Fix LHFLX name and add catch for non-existent or empty TE stitch file (#876)

Add support for time series datasets via glob and fix `enso_diags` set (#866)

Add fix for checking `is_time_series()` property based on `data_type` attr (#881)

CDAT migration: Fix African easterly wave density plots in TC analysis and convert H20LNZ units to ppm/volume (#882)

CDAT Migration: Update `mp_partition_driver.py` to use Dataset from `dataset_xr.py` (#883)

CDAT Migration - Port JJB tropical subseasonal diags to Xarray/xCDAT (#887)

CDAT Migration: Prepare branch for merge to `main` (#885)

[Refactor]: CDAT Migration - Update dependencies and remove Dataset._add_cf_attrs_to_z_axes() (#891)

CDAT Migration Phase 2: Refactor core utilities and  `lat_lon` set (#677)

Refer to the PR for more information because the changelog is massive.

Update build workflow to run on `cdat-migration-fy24` branch

CDAT Migration Phase 2: Add CDAT regression test notebook template and fix GH Actions build (#743)

- Add Makefile for quick access to multiple Python-based commands such as linting, testing, cleaning up cache and build files
- Fix some lingering unit tests failure
- Update `xcdat=0.6.0rc1` to `xcdat >=0.6.0` in `ci.yml`, `dev.yml` and `dev-nompi.yml`
- Add `xskillscore` to `ci.yml`
- Fix `pre-commit` issues

CDAT Migration Phase 2: Regression testing for `lat_lon`, `lat_lon_land`, and `lat_lon_river` (#744)

- Add Makefile that simplifies common development commands (building and installing, testing, etc.)
- Write unit tests to cover all new code for utility functions
  - `dataset_xr.py`, `metrics.py`, `climo_xr.py`, `io.py`, `regrid.py`
- Metrics comparison for  `cdat-migration-fy24` `lat_lon` and `main` branch of `lat_lon` -- `NET_FLUX_SRF` and `RESTOM` have the highest spatial average diffs
- Test run with 3D variables (`_run_3d_diags()`)
  - Fix Python 3.9 bug with using pipe command to represent Union -- doesn't work with `from __future__ import annotations` still
  - Fix subsetting syntax bug using ilev
  - Fix regridding bug where a single plev is passed and xCDAT does not allow generating bounds for coordinates of len <= 1 -- add conditional that just ignores adding new bounds for regridded output datasets, fix related tests
  - Fix accidentally calling save plots and metrics twice in `_get_metrics_by_region()`
- Fix failing integration tests pass in CI/CD
  - Refactor `test_diags.py` -- replace unittest with pytest
  - Refactor `test_all_sets.py` -- replace unittest with pytest
  - Test climatology datasets -- tested with 3d variables using `test_all_sets.py`

CDAT Migration Phase 2: Refactor utilities and CoreParameter methods for reusability across diagnostic sets (#746)

- Move driver type annotations to `type_annotations.py`
- Move `lat_lon_driver._save_data_metrics_and_plots()` to `io.py`
- Update `_save_data_metrics_and_plots` args to accept `plot_func` callable
- Update `metrics.spatial_avg` to return an optionally `xr.DataArray` with `as_list=False`
- Move `parameter` arg to the top in `lat_lon_plot.plot`
- Move `_set_param_output_attrs` and `_set_name_yr_attrs` from `lat_lon_driver` to `CoreParameter` class

CDAT Migration Phase 2: Refactor `zonal_mean_2d()` and `zonal_mean_2d_stratosphere()` sets (#774)

CDAT Migration Phase 2: Refactor `qbo` set (#826)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cdat-migration-fy24 CDAT Migration FY24 Task
Projects
None yet
Development

No branches or pull requests

3 participants