Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor init() to smaller functions #34

Merged
merged 18 commits into from
Dec 1, 2023

Conversation

Ben-Choat
Copy link
Contributor

@Ben-Choat Ben-Choat commented Aug 16, 2023

Refactored init() within topmodel.c to four smaller functions. Ultimately, this was done to allow calibratable parameters to update secondary variables withib bmi_topmodel.c. For now, only topmodel.c has been edited.

Additions

  • four functions
  • convert_dist_to_histords(double *dist_from_outlet, int num_channels,
    double chv, double rv, double dt, double tch[11]))
  • calc_time_delay_histogram(int max_time_delay_ordinates, int num_channels, double area,
    double tch[11], int *num_time_delay_histo_ords, int *num_delay,
    double **time_delay_histogram, double *cum_dist_area_with_dist)
  • init_water_balance(double **deficit_local, int max_atb_increments,
    int num_topodex_values, double dt, double *sr0,
    double *szm, double *Q0, double *t0, double tl,
    double **stor_unsat_zone, double *szq,
    double **deficit_root_zone, double *sbar,
    double *bal)
  • init_discharge_array(int *num_delay, double *Q, double *Q0, double area,
    int *num_time_delay_histo_ords, double **time_delay_histogram)

Removals

Changes

  • Moved portions of init() to four smaller functions, and init() now calls those four smaller functions.

Testing

  1. Built and ran topmodel before refactoring and after refactoring and output was identical.
  2. Edited and tested in Ubuntu 22.04

Screenshots

Notes

  • deficit_local is initialized, but not used within init(). Should it still appear there?

Todos

  • Edit bmi_topmodel.c to enable secondary variables to be calcualted from calibratable parameters defined within realization file.

Checklist

  • [x ] PR has an informative and human-readable title
  • [ x] Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Passes all existing automated tests
  • [ x] Any change in functionality is tested
  • [ x] New functions are documented (with a description, list of inputs, and expected output)
  • [ x] Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • [x ] Reviewers requested with the Reviewers tool

Target Environment support

  • [x ] Windows
  • [ x] Linux

Accessibility

  • [ x] Keyboard friendly
  • [ x] Screen reader friendly

Other

  • [ x] Is useable without CSS
  • [ x] Is useable without JS
  • [ x] No linting errors or warnings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove from this PR, the git history will contain the original file if you are concerned about loosing the previous implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Will focus on removing extraneous changes/files from PR, better documenting within the code, and giving special attention to assumption around memory allocation. If this is not already done somewhere in the original code I may need to touch base with you on that sometime next week to make sure I'm going about it properly. Thanks for the feedback!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove from the PR, the original is in the version history

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove from PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like maybe you ended up staging some working files you didn't intend to. I see now that a couple of these exist in src/WorkingFiles. You can always add to your local .gitignore to prevent spurious working directories from getting staged.

src/topmodel.c Outdated
* **********************************/


// FUNCTION TO CONVERT DISTANCE/AREA FORM TO TIME DELAY HISTOGRAM ORDINATES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely reccomend documenting function paramers inline. using doxygen style comments can also help auto generate human readable docs in the future. Consider a comment like this

/**
* Function to convert Distance/Area from T0 time delay historgram ordinates.
*
* <additional details, also known as long description, here e.g.>
* This function converts paramters to m/time step DT with exception of XK0
* which must stay in m/h.  
* Q0 is already in m/time step
* T0 is input as Ln(To)
* 
* @param[in] dist_from_outlet pointer to array of length num_channels containing WHAT
* @param[in] num_channels the number of channels, corresponds to the length of dist_from_outlet and tch arrays
* @param[in] chv <is what???>
* @param[in] rv <is what???>
* @param[in] dt delta time, or time step, used to convert chv and rv
* @param[out] tch <what is tch, generated by this function>
*/

I'll also note that I don't see an XK0 or T0 parameter anywhere in this function, so perhaps that comment needs to be dropped???


for(ir=1;ir<=(*num_time_delay_histo_ords);ir++)
{
time=(double)(*num_delay)+(double)ir;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this cast is suspicious, why does time need to be a double? both num_delay and ir are ints, so should time simply be an int as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also from original code.

time=(double)(*num_delay)+(double)ir;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hellkite500, can you make heads or tails of this casting? The old Fortran code does it, too. Maybe a legacy of it being over 40 years old?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I can think of is that time was expected to be large, and a double type was used to avoid overflowing an integer with large enough time values? For signed (positive/negative) ints, an 8 bit int can hold numbers up to 127, a 16 bit int 32,767, 32-bit int 2,147,483,647, and a 64-bit int 9,223,372,036,854,775,807.

It is possible that original code was developed on something with only 16 bit integers and double was used to increase the largest possible time representation.

If we really want to ensure we can hold large representations for time, we can change the type of time to int64_t and use

#include <stdint.h>

to have a portable, compliant 64 bit int to put that into...

We can also use unsigned types if we know the values are not ever going to be negative.

The other point I'll make here is that even if time does need to be a floating point, you don't have to cast the int, you can safely add ints and store them in a double, the compiler will handle that just fine.

double time, a1, sumar, a2;
int j, ir;

if((*time_delay_histogram)==NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function's responsibility is to calc the histogram, it probably should also be responsible for initializing memory. At the same time, its probably a good practice for any of these functions that accept pointers as INPUT to check the pointer is valid (e.g. not NULL) and simply throw an error if it is.

assert( time_delay_historgram != NULL )
assert( *time_delay_histogram != NULL )
assert( num_time_delay_histo_ords != NULL )
assert( num_delay != NULL )
assert( cum_dist_area_with_dist != NULL )

src/topmodel.c Outdated
// updates values associated with model structure
// outputs used in TOPMOD() include num_time_delay_histo_ords, time_delay_histogram,
// num_delay, and Q
extern void calc_time_delay_histogram(int max_time_delay_ordinates, int num_channels, double area,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common C style guide/implementation is to put all input params (things that are NOT modified by the function) first in the arg list, and all out or in/out params (things that will be modified by the function) last. I strongly recommend this!

Also its imperative in C codes to document assumptions on memory already being allocated vs the memory to be allocated and returned by the function. As noted below, this function could NOT check and allocate the time_delay_histogram, which is valid, but it should be documented in the doc string (see above comment on doxygen style doc strings) either way. Keeping the allocation his is OK, as long as the caller knows it was allocated in this function and can potentially deallocate it later.

src/topmodel.c Outdated
// updates values associated with model structure
// outputs used in TOPMOD() include sbar, bal, deficit_root_zone

extern void init_water_balance(double **deficit_local, int max_atb_increments,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another good example of ordering params and documenting behavior. Most "init" functions are safe to assume are doing some allocation/initialization, but being clear about WHICH variables are being allocated is important

)
{
// declare local variables
double sum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can declare/init sum in one line for berevity

double sum = 0.0;

@hellkite500
Copy link
Member

Left some initial review comments, and mashed the button before I could elaborate on my requested changes. Definitely need to address some of the array declaration/passing with static sizes (11) and remove the unnecessary files. The rest of the comments are stylistic and documentation related, but would likely be worth addressing to set the tone for future refactoring and/or contributions/maintainers -- they will thank us later!

Copy link

@SnowHydrology SnowHydrology left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ben-Choat, please focus on making the changes suggested by @hellkite500, particularly the removal of all temporary files (as he noted, GitHub does the version tracking—no need for WorkingFiles or backup) and the addition of comments reflecting the newly implemented order of parameters.

You can also suggest @ajkhattak and @madMatchstick as reviewers while I'm out. Thanks!

// FUNCTION TO CONVERT DISTANCE/AREA FORM TO TIME DELAY HISTOGRAM ORDINATES
// outputs used in subsequent functions include tch
extern void convert_dist_to_histords(double *dist_from_outlet, int num_channels,
double chv, double rv, double dt, double tch[11])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hellkite500, my reading of the original code is that TCH(10) is a hard-coded limit corresponding to the number of channels, a parameter value (num_channels) read in from subcat.dat.

This does not seem like the correct way of implementing things because there is no guardrail, as far as I know, preventing num_channels from being > 10. How would you recommend handling this?

Also, as a C dilettante, why would it be TCH(10) in Fortran and tch[11] in C (the 10 vs. 11 part, not the rest of the syntax)?


for(ir=1;ir<=(*num_time_delay_histo_ords);ir++)
{
time=(double)(*num_delay)+(double)ir;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hellkite500, can you make heads or tails of this casting? The old Fortran code does it, too. Maybe a legacy of it being over 40 years old?

@madMatchstick
Copy link
Contributor

@Ben-Choat - Looks good overall

atm gh actions/workflows check standalone and unit test scripts, all of which passed.

  1. Can you please add under "testing" a single bullet detailing ngen build/run results & perhaps paste any console out if appropriate. Even a comment stating "tested updated code in ngen without error, results seem reasonable" would suffice.
  2. Best to reference local machine/OS somewhere in PR for good measure (unless I missed it) :)

-thx!

@Ben-Choat Ben-Choat marked this pull request as ready for review September 8, 2023 21:19
@Ben-Choat Ben-Choat dismissed stale reviews from SnowHydrology and hellkite500 September 8, 2023 21:21

Has been addressed.

Copy link
Contributor

@madMatchstick madMatchstick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates run in ngen coupled with PET, as outlined in wiki. Output seems reasonable.

@madMatchstick madMatchstick merged commit 4995885 into NOAA-OWP:master Dec 1, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants