Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdfData class for streamlined I/O in plotting scripts #269

Merged
merged 52 commits into from
Jul 11, 2024
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
b28b6ac
sketching out a path for AdfData; adf_dataset.py and zonal_Mean_B.py
brianpm Oct 27, 2023
796a8c0
working on making it run
brianpm Oct 27, 2023
18706b3
fix getting reference case path
brianpm Oct 27, 2023
efdfba5
allow reference case to have a label
brianpm Oct 27, 2023
6456fdc
adfdata applied to global maps script
brianpm Dec 1, 2023
aababb2
adfdata applied to global maps script -- rename function
brianpm Dec 1, 2023
988011b
adfdata applied to global maps script -- debugging
brianpm Dec 1, 2023
91bce2a
adfdata applied to global maps script -- debugging
brianpm Dec 1, 2023
bf02a1f
adfdata applied to global maps script -- debugging
brianpm Dec 1, 2023
71578ef
debugging
brianpm Dec 1, 2023
06d3ea1
debugging
brianpm Dec 1, 2023
11f1477
debugging
brianpm Dec 1, 2023
c7286b7
debugging
brianpm Dec 1, 2023
97e1cbd
debugging
brianpm Dec 1, 2023
9068ec9
debugging -- fix load_da; actually return the DataArray
brianpm Dec 1, 2023
ceecec9
debugging
brianpm Dec 1, 2023
b48d01a
debugging -- fix load_reference_da; actually return the DataArray
brianpm Dec 1, 2023
dd015ef
debugging
brianpm Dec 1, 2023
84150a4
debugging
brianpm Dec 1, 2023
42614fb
debugging
brianpm Dec 1, 2023
b7765ab
debugging
brianpm Dec 1, 2023
857a441
debugging
brianpm Dec 1, 2023
577c9e7
debugging
brianpm Dec 1, 2023
87aebf2
debugging
brianpm Dec 1, 2023
c751d02
debugging
brianpm Dec 1, 2023
f9f296d
completed updates for AdfData and implement in zonal mean and global …
brianpm Dec 1, 2023
9ec949b
remove extra debugging print statements
brianpm Dec 1, 2023
3adbfe0
refactor plot_file_op for easier logic
brianpm Dec 1, 2023
51394a6
seasonal averaging moved to inside season loop
brianpm Dec 1, 2023
cf2128c
seasonal averaging moved to inside season loop
brianpm Dec 1, 2023
3be03db
correct zonal mean error message
brianpm Dec 1, 2023
536d2f2
Merge branch 'NCAR:main' into adf_case_dataclass
brianpm Apr 12, 2024
6b9f671
updated adf_dataset.py
brianpm Apr 19, 2024
7d2c946
Merge branch 'NCAR:main' into adf_case_dataclass
brianpm Apr 19, 2024
5d46906
starting to address PR comments. First round done on adf_dataset.py
brianpm Jun 12, 2024
ce50839
merged my conflicted adf_dataset.py
brianpm Jun 12, 2024
142d4e5
addressing PR comments. Instantiate AdfData from AdfDiag
brianpm Jun 12, 2024
935b6f4
Merge branch 'NCAR:main' into adf_case_dataclass
brianpm Jun 12, 2024
cae606f
testing AdfData. Fixes for history file naming. Fix to force loading …
brianpm Jun 12, 2024
0183799
removed commented lines
brianpm Jun 12, 2024
26a059c
addressing Jesse's comments on PR
brianpm Jun 18, 2024
7d6cdc9
Replace and with versions using new class
brianpm Jun 21, 2024
df14b1b
try to merge amwg_table from ADF/main
brianpm Jun 21, 2024
b966637
resolve upstream conflicts
brianpm Jun 21, 2024
39da27e
add back useful changes to adf_info
brianpm Jun 21, 2024
2d5a7dc
Merge branch 'main' into adf_case_dataclass
justin-richling Jun 21, 2024
de213fc
Merge branch 'main' into adf_case_dataclass
justin-richling Jun 26, 2024
231aaaa
bug fixes.
brianpm Jun 27, 2024
6e20dae
correct arguments for load_reference_regrid_da
brianpm Jun 27, 2024
496d5c7
Merge branch 'main' into adf_case_dataclass
justin-richling Jul 11, 2024
e36585f
trying to fix linting errors
brianpm Jul 11, 2024
febddf5
added load method for timeseries files
brianpm Jul 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
210 changes: 210 additions & 0 deletions lib/adf_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
from pathlib import Path
import xarray as xr

import warnings # use to warn user about missing files

def my_formatwarning(msg, *args, **kwargs):
# ignore everything except the message
return str(msg) + '\n'
warnings.formatwarning = my_formatwarning

# "reference data"
# It is often just a "baseline case",
# but could be some totally external data (reanalysis or observation or other model)
# When it is another simulation, it gets treated like another "case"
# When it is external data expect:
# - "climo" files (12 monthly climos in the file)
# - one variable per "climo"
# - source can differ for each variable, requires label
# - resolution can differ for each variable, requires regridded file(s)
# - the variable name and units in the file may differ from CAM; use defaults.yaml to set conversion
# - there could be multiple instances of a variable from different sources (e.g. different observations)

# NOTE: the last item (multiple instances of a variable) is not allowed in AdfObs.var_obs_dict
# Since ADF is not able to handle this case, for now it is excluded the AdfData class.

# NOTE: To make the "baseline case" vs "external data" cases as similar as possible,
# below construct the "baseline case" version to be similar to "external data".
# - provide a dictionary of (variable: file-path)
# + For external data, that dictionay is from AdfObs.var_obs_dict,
# which provides a dict of all the available variables.
# + For reference simulation, look for files that match the diag_var_list

# NOTE: There is currently a "base_nickname" allowed from AdfInfo.
# Set AdfData.ref_nickname to that.
# Could be altered from "Obs" to be the data source label.

class AdfData:
"""A class instantiated with an AdfDiag object.
Methods provide means to load data.
This class does not interact with plotting,
just provides access to data locations and loading data.

A future need is to add some kind of frequency/sampling
parameters to allow for non-h0 files.

"""
def __init__(self, adfobj):
self.adf = adfobj # provides quick access to the AdfDiag object
# paths
self.model_rgrid_loc = adfobj.get_basic_info("cam_regrid_loc", required=True)

# variables (and info for unit transform)
self.var_list = adfobj.diag_var_list
self.res = adfobj.variable_defaults
nusbaume marked this conversation as resolved.
Show resolved Hide resolved

# case names and nicknames
self.case_names = adfobj.get_cam_info("cam_case_name", required=True)
self.test_nicknames = adfobj.case_nicknames["test_nicknames"]
self.base_nickname = adfobj.case_nicknames["base_nickname"]
self.ref_nickname = self.base_nickname

# define reference data
self.reference_is_obs = adfobj.get_basic_info("compare_obs")
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
self.set_reference() # specify "ref_labels" -> called "data_list" in zonal_mean (name of data source)

def set_reference(self):
"""Set attributes for reference (aka baseline) data location, names, and variables."""
if self.reference_is_obs:
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
self.ref_var_loc = {v: self.adf.var_obs_dict[v]['obs_file'] for v in self.adf.var_obs_dict}
self.ref_labels = {v: self.adf.var_obs_dict[v]['obs_name'] for v in self.adf.var_obs_dict}
self.ref_var_nam = {v: self.adf.var_obs_dict[v]['obs_var'] for v in self.adf.var_obs_dict}
if not self.adf.var_obs_dict:
print("\t WARNING: reference is observations, but no observations found to plot against.")
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
else:
self.ref_var_loc = {}
self.ref_var_nam = {}
self.ref_labels = {}
# when using a reference simulation, allow a "special" attribute with the case name:
self.ref_case_label = self.adf.get_baseline_info("cam_case_name", required=True)
for v in self.var_list:
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
f = self.get_reference_climo_file(v)
if f is None:
print(f"\t WARNING: ADFData found no reference climo file for {v}")
continue
else:
self.ref_var_loc[v] = f
self.ref_var_nam[v] = v
self.ref_labels[v] = self.adf.get_baseline_info("cam_case_name", required=True)

def get_reference_climo_file(self, var):
"""Return a list of files to be used as reference (aka baseline) for variable var."""
if self.reference_is_obs:
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
fils = self.ref_var_loc.get(var, None)
return [fils] if fils is not None else None
self.ref_loc = self.adf.get_baseline_info("cam_climo_loc")
# NOTE: originally had this looking for *_baseline.nc
fils = sorted(Path(self.ref_loc).glob(f"{self.ref_case_label}_{var}_climo.nc"))
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
if fils:
return fils
return None

def load_reference_dataset(self, var):
fils = self.get_reference_climo_file(var)
if not fils:
print(f"ERROR: Did not find any reference files for variable: {var}. Will try to skip.")
return None
return self.load_dataset(fils)

def load_reference_da(self, variablename):
da = self.load_reference_dataset(variablename)[self.ref_var_nam[variablename]]
if variablename in self.res:
vres = self.res[variablename]
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
if self.reference_is_obs:
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
scale_factor = vres.get("obs_scale_factor",1)
add_offset = vres.get("obs_add_offset", 0)
else:
scale_factor = vres.get("scale_factor",1)
add_offset = vres.get("add_offset", 0)
da = da * scale_factor + add_offset
da.attrs['units'] = vres.get("new_unit", da.attrs.get('units', 'none'))
return da


def load_climo_da(self, case, variablename):
"""Return DataArray from climo file"""
fils = self.get_climo_file(case, variablename)
return self.load_da(fils, variablename)


def load_climo_file(self, case, variablename):
"""Return Dataset for climo of variablename"""
fils = self.get_climo_file(case, variablename)
if not fils:
print(f"ERROR: Did not find climo file for variable: {variablename}. Will try to skip.")
return None
return self.load_dataset(fils)


def get_climo_file(self, case, variablename):
"""Retrieve the climo file path(s) for variablename for a specific case."""
a = self.adf.get_cam_info("cam_climo_loc", required=True) # list of paths (could be multiple cases)
caseindex = (self.case_names).index(case) # the entry for specified case
# print(f"Checking if case name is in the climo loc entry: {case in a[caseindex]}")
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
model_cl_loc = Path(a[caseindex])
return sorted(model_cl_loc.glob(f"{case}_{variablename}_climo.nc"))

def get_timeseries_file(self, case, field):
ts_locs = self.adf.get_cam_info("cam_ts_loc", required=True)
ts_loc = Path(ts_locs[case])
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
ts_filenames = f'{case}.*.{field}.*nc'
ts_files = sorted(ts_loc.glob(ts_filenames))
return ts_files

def get_ref_timeseries_file(self, field):
if self.reference_is_obs:
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
return None
else:
ts_loc = Path(self.adf.get_baseline_info("cam_ts_loc", required=True))
ts_filenames = f'{self.ref_case_label}.*.{field}.*nc'
ts_files = sorted(ts_loc.glob(ts_filenames))
return ts_files


def get_regrid_file(self, case, field):
model_rg_loc = Path(self.adf.get_basic_info("cam_regrid_loc", required=True))
rlbl = self.ref_labels[field] # rlbl = "reference label" = the name of the reference data that defines target grid
return sorted(model_rg_loc.glob(f"{rlbl}_{case}_{field}_*.nc"))

def load_regrid_dataset(self, case, field):
fils = self.get_regrid_file(case, field)
if not fils:
print(f"ERROR: Did not find regrid file(s) for case: {case}, variable: {field}")
return None
return self.load_dataset(fils)

def load_regrid_da(self, case, field):
fils = self.get_regrid_file(case, field)
if not fils:
print(f"ERROR: Did not find regrid file(s) for case: {case}, variable: {field}")
return None
return self.load_da(fils, field)

def get_file_list():
pass
nusbaume marked this conversation as resolved.
Show resolved Hide resolved

def load_dataset(self, fils):
if (len(fils) == 0):
warnings.warn("Input file list is empty.")
return None
elif (len(fils) > 1):
ds = xr.open_mfdataset(fils, combine='by_coords')
else:
sfil = str(fils[0])
assert Path(sfil).is_file(), f"Needs to be a file: {sfil}"
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
ds = xr.open_dataset(sfil)
if ds is None:
warnings.warn(f"invalid data on load_dataset")
return ds

def load_da(self, fils, variablename):
ds = self.load_dataset(fils)
if ds is None:
print(f"ERROR: Load failed for {variablename}")
return None
da = (ds[variablename]).squeeze()
if variablename in self.res:
vres = self.res[variablename]
da = da * vres.get("scale_factor",1) + vres.get("add_offset", 0)
da.attrs['units'] = vres.get("new_unit", da.attrs.get('units', 'none'))
return da
4 changes: 2 additions & 2 deletions lib/plotting_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -2317,9 +2317,9 @@ def square_contour_difference(fld1, fld2, **kwargs):
mnorm = mpl.colors.Normalize(mn, mx)

coord1, coord2 = fld1.coords # ASSUMES xarray WITH coords AND 2-dimensions
print(f"{coord1}, {coord2}")
# print(f"{coord1}, {coord2}")
xx, yy = np.meshgrid(fld1[coord2], fld1[coord1])
print(f"shape of meshgrid: {xx.shape}")
# print(f"shape of meshgrid: {xx.shape}")
nusbaume marked this conversation as resolved.
Show resolved Hide resolved

img1 = ax1.contourf(xx, yy, fld1.transpose())
if (coord1 == 'month') and (fld1.shape[0] ==12):
Expand Down
15 changes: 13 additions & 2 deletions scripts/averaging/create_climo_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,20 @@ def my_formatwarning(msg, *args, **kwargs):
return str(msg) + '\n'
warnings.formatwarning = my_formatwarning


import numpy as np
import xarray as xr # module-level import so all functions can get to it.

import multiprocessing as mp

def get_time_slice_by_year(time, startyear, endyear):
if not hasattr(time, 'dt'):
print("Warning: get_time_slice_by_year requires the `time` parameter to be an xarray time coordinate with a dt accessor. Returning generic slice (which will probably fail).")
return slice(startyear, endyear)
start_time_index = np.argwhere((time.dt.year >= startyear).values).flatten().min()
end_time_index = np.argwhere((time.dt.year <= endyear).values).flatten().max()
return slice(start_time_index, end_time_index+1)



##############
#Main function
Expand Down Expand Up @@ -205,7 +214,9 @@ def process_variable(ts_files, syr, eyr, output_file):
cam_ts_data.assign_coords(time=time)
cam_ts_data = xr.decode_cf(cam_ts_data)
#Extract data subset using provided year bounds:
cam_ts_data = cam_ts_data.sel(time=slice(syr, eyr))
#cam_ts_data = cam_ts_data.sel(time=slice(syr, eyr))
tslice = get_time_slice_by_year(cam_ts_data.time, int(syr), int(eyr))
cam_ts_data = cam_ts_data.isel(time=tslice)
nusbaume marked this conversation as resolved.
Show resolved Hide resolved
#Group time series values by month, and average those months together:
cam_climo_data = cam_ts_data.groupby('time.month').mean(dim='time')
#Rename "months" to "time":
Expand Down
Loading
Loading