Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: cosp_histogram_standardize() conditional for adding bounds is incorrect and default bounds length might not align with coordinates #761

Open
tomvothecoder opened this issue Dec 7, 2023 · 5 comments
Labels
bug Bug fix (will increment patch version)

Comments

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Dec 7, 2023

What happened?

In PR #748, I found a silent bug in a conditional statement for handling bounds in cosp_histogram_standardize().

prs_bounds = getattr(prs, "bounds")
if prs_bounds is None:
cloud_prs_bounds = np.array(
[
[1000.0, 800.0],
[800.0, 680.0],
[680.0, 560.0],
[560.0, 440.0],
[440.0, 310.0],
[310.0, 180.0],
[180.0, 0.0],
]
) # length 7
prs.setBounds(np.array(cloud_prs_bounds, dtype=np.float32))

The conditional follows this flow:

  1. Check if the axis has a "bounds" attr
  2. If "bounds" attr exists, don't add bounds
    - Issue: bounds are never added for the coordinates that don't actually have bounds
  3. If "bounds" attr does not exist, add bounds
    - Issue: the default bounds length (cloud_prs_bounds) might not align with the length of the coordinates, resulting in this error: ValueError: conflicting sizes for dimension 'misr_cth': length 7 on the data but length 16 on coordinate 'misr_cth'

What did you expect to happen? Are there are possible answers you came across?

In my version of the code, I get the "bounds" attr AND check if the bounds actually exist in the dataset.

If bounds don't exist in the dataset, bounds should be dynamically generated to align with the length of the coordinates instead of using a fixed bounds array (I think?).

def _add_missing_cloud_bnds(
    ds: xr.Dataset, axis_var: xr.DataArray, axis: CloudAxis
) -> xr.Dataset:
    bnds_key = axis_var.attrs.get("bounds")
    if bnds_key is None or bnds_key not in ds.data_vars.keys():
        # Should align with the length of the data.
        bnds_data = `_get_missing_cloud_bnds()`
        default_bnds = xr.DataArray(
            name=f"{axis_var.name}_bnds",
            data=bnds_data,
            dims=list(axis_var.dims) + ["bnds"],
            coords=axis_var.coords,
        )
        ds[default_bnds.name] = default_bnds

    return ds

Minimal Complete Verifiable Example (MVCE)

import cdms2

# Make sure bounds are not automatically generated
cdms2.axis.setAutoBounds(0)
ds = cdms2.open(
    "tests/integration/integration_test_data/CLDMISR_ERA-Interim_ANN_198001_201401_climo.nc"
)

# Get the prs axis.
prs = ds["CLD_MISR"].getAxis(1)

# Get the prs "bounds" attr.
prs.bounds
# 'misr_cth_bnds'

# Check if  "misr_cth_bnds" exists in the dataset.
prs.getBounds()
# None

# This logic is in `cosp_histogram_standardize()`, which is incorrect because `prs.getBounds()` is `None`.
if prs.bounds is None:
    print("Add bounds here")

# Print out the variables. Notice how there is no "misr_cth_bnds"
ds.variables
# {'misr_tau_bnds': <cdms2.fvariable.FileVariable at 0x7f3dbdc8aec0>,
#  'lat_bounds': <cdms2.fvariable.FileVariable at 0x7f3953145db0>,
#  'lon_bounds': <cdms2.fvariable.FileVariable at 0x7f3953146650>,
#  'CLMISR': <cdms2.fvariable.FileVariable at 0x7f39531462f0>}

Relevant log output

No response

Anything else we need to know?

No response

Environment

Latest main version of e3sm_diags and all previous versions that use the cosp_histogram_standarize() function.

@tomvothecoder tomvothecoder added the bug Bug fix (will increment patch version) label Dec 7, 2023
@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Dec 7, 2023

Hi @chengzhuzhang, I believe I found a bug in the old version of cosp_histogram_standardize(). Let me know if you agree with my solution, which includes adding missing bounds dynamically.

I might need help understanding the logic for the function that dynamically generates bounds (e.g., use coordinates as midpoints?).

@chengzhuzhang
Copy link
Contributor

Apologize for missing out this bug report! It is an interesting silence bug. And it almost suggests that the bounds are never used in the downstream calculation? I will have to look more.

@tomvothecoder
Copy link
Collaborator Author

Apologize for missing out this bug report! It is an interesting silence bug. And it almost suggests that the bounds are never used in the downstream calculation? I will have to look more.

As far as I can tell, the old CDAT version of cosp_histogram_standardize() and cosp_histogram_driver.py don't reference prs or tau bounds for any calculations. cosp_histogram_standardize() subsets the variable using the low and high values of prs and/or tau, not prs bounds and/or tau bounds.

@chengzhuzhang
Copy link
Contributor

Hi Tom. I agree with your assessment that the cosp_histogram_standardize() are never actually used in the cdat version of the code. The conditions such as prs_bounds is None is never meet, because prs_bounds always has a string type, if the bound attribute exist. And since the bounds are not referenced in calculation, we can perhaps remove this function?

@tomvothecoder
Copy link
Collaborator Author

Thanks for investigating @chengzhuzhang.

We can simplify the codebase by removing this function since it doesn't seem necessary to keep. Bounds are not used for calculations and you confirmed that newer model data and reference data have bounds (your comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fix (will increment patch version)
Projects
None yet
Development

No branches or pull requests

2 participants