-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only read requested vars #22
base: main
Are you sure you want to change the base?
Conversation
Currently testing runtimes on two cfgs, reduced versions of the example cfg in E3SM-Project/zppy#694, for comparison against the runtimes reported in E3SM-Project/zppy#694 (comment) vars = "", plots_lnd = 25 vars
vars = 25 vars, plots_lnd = 25 vars
|
|
I'm still thinking #19 might be necessary though. Tom's analysis work in #21 showed a directory that "contains 353 files", which matches the number of land vars. In E3SM-Project/zppy#694, we had a long runtime because we were trying to read data for 353 variables when we only needed 25 of them. That would mean in the case where we do actually want 353 variables, the runtime would still be long. |
I'm running another test cfg for this case, to confirm: vars = "", plots_lnd = "all"
|
file_path_list: List[str] = [] | ||
for var in var_list: | ||
# `var` will be of the form `FSNS`, `FLNS`, etc. | ||
file_path_list.append(f"{directory}{var.variable_name}*.nc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #21:
Xarray/xCDAT will attempt to concatenate all of these files into a single
Dataset
object, which can be slow depending on the number of files, the file sizes, and the shape/structure of the data.
I'm almost wondering if it'd be better to read in one variable at a time: at the beginning of globalAnnual
open just that variable. As far as I can tell, with two important exceptions, variables aren't requested multiple times. Those exceptions are:
- The variables used for derivation in
globalAnnualHelper
can end up requested multiple times; they don't have to be re-loaded though because they already exist in theDatasetWrapper
- The variables used in both "original" and "atm"/"ocn" components are loaded twice -- as plots_original variables and as plots_atm/plots_ocn variables.
> git grep -n dataset_wrapper
zppy_interfaces/global_time_series/coupled_global.py:10:from zppy_interfaces.global_time_series.coupled_global_dataset_wrapper import (
zppy_interfaces/global_time_series/coupled_global.py:168: dataset_wrapper: DatasetWrapper = DatasetWrapper(exp[exp_key], var_list)
zppy_interfaces/global_time_series/coupled_global.py:180: data_array, units = dataset_wrapper.globalAnnual(var)
zppy_interfaces/global_time_series/coupled_global.py:214: del dataset_wrapper
zppy_interfaces/global_time_series/coupled_global.py:257: dataset_wrapper = DatasetWrapper(exp["ocean"], [ohc_variable])
zppy_interfaces/global_time_series/coupled_global.py:258: exp["annual"]["ohc"], _ = dataset_wrapper.globalAnnual(ohc_variable)
zppy_interfaces/global_time_series/coupled_global.py:264: dataset_wrapper = DatasetWrapper(exp["vol"], [vol_variable])
zppy_interfaces/global_time_series/coupled_global.py:265: exp["annual"]["volume"], _ = dataset_wrapper.globalAnnual(vol_variable)
> git grep -n set_area_tuple
zppy_interfaces/global_time_series/coupled_global_dataset_wrapper.py:28: def set_area_tuple(self):
zppy_interfaces/global_time_series/coupled_global_dataset_wrapper.py:162: self.set_area_tuple()
Unfortunately this fix isn't working yet. See errors below. It appears
That is, we weren't looking for a So, this is a complication that needs to be addressed before merging this PR... Errors picking up RESTOM
|
as i earlier mentioned, it is a bit of challenging to debug at this point. I'd hope to see the |
Trying to think of possible paths forward here:
@chengzhuzhang I'm thinking (1) at this point given the Unified timeline (and then determine how to improve it in the next development cycle). What do you think?
Yes, they have become quite different over the past half-year. We should discuss this more but it's not something we can fit in before the Unified release. |
yes this make sense, we can use the same land var list we used in the original .cfg for now in the template in ts-glb task, and make a note about potential performance impact of using var = " " on the global time series plotting. |
|
Summary
Objectives:
Issue resolution:
DatasetWrapper.dataset
#21Select one: This pull request is...
Small Change