Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCM lons in weights #183

Open
orianac opened this issue May 12, 2022 · 6 comments
Open

GCM lons in weights #183

orianac opened this issue May 12, 2022 · 6 comments

Comments

@orianac
Copy link
Member

orianac commented May 12, 2022

In the downscaling workflows we open the GCM zarr stores and adjust the lons to a [-180,180] range using a postprocess call (

ds = col_subset[keys[0]]().to_dask().pipe(postprocess)
). The weights generation flow opens GCMs directly without shifting any lons. This means that when we apply those weights they are shifted from the datasets we're applying them to. In other words, an issue arises if the dataset you're applying the weights file to has lons that differ from the dataset used to create the weights file.

I think the solution is to add the postprocess call to the weights generation routine -my guess is right after L67 would work (

ds_in = xr.open_dataset(store['zstore'], engine='zarr', chunks={}).isel(time=0)
).

@andersy005 can you implement this fix? Also, might this be relevant for the pyramid generation steps since I believe they also use weights? In the event that these weights (or other weights with a similar mismatch issue) are used in other places we should check all of them. I checked the ERA5 step and it appears we use the same utility to open ERA5 in the weights generation as in the workflows (which is relevant since we adjust the lat ordering at

). But in case there is any other place we open ERA5 for the creation of pyramids, as long as we're using weights we need to have that same lat reordering implemented as well.

@andersy005
Copy link
Member

Good catch, @orianac! yes, this is relevant for the pyramid generation.

can you implement this fix?

i'm on it

@andersy005
Copy link
Member

In the event that these weights (or other weights with a similar mismatch issue) are used in other places we should check all of them.

it appears

p['pyramid_weights'] = get_pyramid_weights(run_parameters=run_parameters, levels=4)

is using the pre-generated weights:

def get_pyramid_weights(*, run_parameters, levels, regrid_method="bilinear"):

i presume we'll need to re-generate the pyramids for the bcsd runs.

Cc @norlandrhagen

@andersy005
Copy link
Member

andersy005 commented May 12, 2022

I just ran into an interesting issue. our postprocess() function assumes that we are dealing with GCMs on regular lat/lon grids. However, some of the GCMs use unstructured grids, for e.g.:

MPI-M/ICON-ESM-LR
In [1]: import xarray as xr

In [2]: path = 'az://cmip6/CMIP/MPI-M/ICON-ESM-LR/historical/r1i1p1f1/day/pr/gn/v20210215/'

In [3]: from cmip6_downscaling.data.cmip import postprocess

In [4]: ds = xr.open_zarr(path)

In [5]: ds
Out[5]: 
<xarray.Dataset>
Dimensions:             (i: 20480, time: 60265, bnds: 2, vertices: 3)
Coordinates:
  * i                   (i) int32 0 1 2 3 4 5 ... 20475 20476 20477 20478 20479
    latitude            (i) float64 dask.array<chunksize=(20480,), meta=np.ndarray>
    longitude           (i) float64 dask.array<chunksize=(20480,), meta=np.ndarray>
  * time                (time) datetime64[ns] 1850-01-01T12:00:00 ... 2014-12...
    time_bnds           (time, bnds) datetime64[ns] dask.array<chunksize=(30133, 1), meta=np.ndarray>
Dimensions without coordinates: bnds, vertices
Data variables:
    pr                  (time, i) float32 dask.array<chunksize=(691, 20480), meta=np.ndarray>
    vertices_latitude   (i, vertices) float64 dask.array<chunksize=(20480, 3), meta=np.ndarray>
    vertices_longitude  (i, vertices) float64 dask.array<chunksize=(20480, 3), meta=np.ndarray>
Attributes: (12/53)
    CDI_grid_type:          unstructured
    CDO:                    Climate Data Operators version 2.0.0rc5 (https://...
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            CMIP
    branch_method:          standard
    branch_time_in_child:   0.0
    ...                     ...
    table_info:             Creation Date:(09 May 2019) MD5:5f007c16960eee824...
    title:                  ICON-ESM-LR output prepared for CMIP6
    tracking_id:            hdl:21.14100/12a736ea-2ab6-43aa-8bb5-616c9b191b20
    variable_id:            pr
    variant_label:          r1i1p1f1
    version_id:             v20210215

In [6]: postprocess(ds)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:1394, in Dataset._construct_dataarray(self, name)
   1393 try:
-> 1394     variable = self._variables[name]
   1395 except KeyError:

KeyError: 'lon'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 postprocess(ds)

File ~/devel/carbonplan/cmip6-downscaling/cmip6_downscaling/data/cmip.py:49, in postprocess(ds)
     46 ds = ds.squeeze(drop=True)
     48 # standardize longitude convention
---> 49 ds = lon_to_180(ds)
     51 # Reorders latitudes to [-90, 90]
     52 if ds.lat[0] > ds.lat[-1]:

File ~/devel/carbonplan/cmip6-downscaling/cmip6_downscaling/data/utils.py:71, in lon_to_180(ds)
     52 '''Converts longitude values to (-180, 180)
     53 
     54 Parameters
   (...)
     66 cmip6_preprocessing.preprocessing.correct_lon
     67 '''
     69 ds = ds.copy()
---> 71 lon = ds["lon"].where(ds["lon"] < 180, ds["lon"] - 360)
     72 ds = ds.assign_coords(lon=lon)
     74 if not (ds["lon"].diff(dim="lon") > 0).all():

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:1498, in Dataset.__getitem__(self, key)
   1495     return self.isel(**cast(Mapping, key))
   1497 if hashable(key):
-> 1498     return self._construct_dataarray(key)
   1499 else:
   1500     return self._copy_listed(key)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:1396, in Dataset._construct_dataarray(self, name)
   1394     variable = self._variables[name]
   1395 except KeyError:
-> 1396     _, name, variable = _get_virtual_variable(
   1397         self._variables, name, self._level_coords, self.dims
   1398     )
   1400 needed_dims = set(variable.dims)
   1402 coords: dict[Hashable, Variable] = {}

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:169, in _get_virtual_variable(variables, key, level_vars, dim_sizes)
    167     ref_var = dim_var.to_index_variable().get_level_variable(ref_name)
    168 else:
--> 169     ref_var = variables[ref_name]
    171 if var_name is None:
    172     virtual_var = ref_var

KeyError: 'lon'

Are these GCMs with unstructured grids excluded from the list of GCMs we are downscaling?

Cc @jhamman

@jhamman
Copy link
Contributor

jhamman commented May 12, 2022

Are these GCMs with unstructured grids excluded from the list of GCMs we are downscaling?

Yes, let's skip this model for now.

@andersy005
Copy link
Member

Okie dokie... i have a prefect flow running for these models ["MIROC6", "AWI-CM-1-1-M", "BCC-CSM2-MR"]

@andersy005
Copy link
Member

The weights for all the models on regular lat/lon grids have been updated.

https://cmip6downscaling.blob.core.windows.net/static/xesmf_weights/cmip6_pyramids/weights.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants