Skip to content

Commit

Permalink
Merge pull request #2 from ESMCI/docbranch
Browse files Browse the repository at this point in the history
update README
  • Loading branch information
jedwards4b authored Dec 2, 2023
2 parents d174945 + a9581f5 commit 062877f
Showing 1 changed file with 189 additions and 2 deletions.
191 changes: 189 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,190 @@
CPRNC is a fortran program used to compare netcdf data files.
It was developed and has evolved for use with CESM.
cprnc README
------------

cprnc is a generic tool for analyzing a netcdf file or comparing
two netcdf files.

If you are trying to debug an installed cprnc tool make sure that you
are looking at the correct one by comparing the path to the one in
your case directory.


Quick Start Guide:
------------------

cprnc uses cmake and requires netcdf-fortran. It is a serial program and must
use a serial build of the netcdf-fortran library. To build
cd cprnc
mkdir bld
cd bld
cmake ../

This should be suffiecient if netcdf-fortran and the compiler that
library was built with are in the path.

Finally, put the resulting executable in CCSM_CPRNC as defined in
config_machines.xml.


Usage: cprnc [-v] [-d dimname:start[:count]] file1 [file2]
-m: Compare each time sample. Default is false, i.e. match "time"
coordinate values before comparing
-v: Verbose output
-d dimname:start[:count]
Print variable values for the specified dimname index subrange.


Users Guide:
------------

cprnc is a Fortran-90 application. It relies on netcdf version 3 or
later and uses the f90 netcdf interfaces. It requires a netcdf include
file and a netcdf library.

cprnc generates an ascii output file via standard out. It initially
summarizes some characteristics of the input file[s]. A compare file is
generally 132 characters wide and an analyze file is less than 80
characters wide.

In analyze mode, the output for a field looks like

( lon, lat, time, -----)
259200 ( 587, 134, 1) ( 269, 59, 1)
FX1 96369 8.273160400390625E+02 0.000000000000000E+00
avg abs field values: 9.052845920820910E+01

and a guide to this information is printed at the top of the file

( dim1, dim2, dim3, dim4)
ARRSIZ1 ( indx1, indx2, indx3) file 1
FIELD NVALID MAX MIN


The first 10 characters of the field name are identified in the first
dozen columns of the third line.
The first line summarizes the names of the dimensions of the field
The second line summarizes the indices of the maximum and minimum value
of the field for the first three dimensions. If the fourth dimension
exists, it's always assumed to be time. Time is handled separately.
The third line summarizes the number of valid values in the array
and the maximum and minimum value over those valid values. Invalid
values are values that are identified to be "fill" value.
The last line summarizes some overall statistics including the average
absolute value of the valid values of the field.

In comparison mode, the output (132 chars wide) for a field looks like

96369 ( lon, lat, time)
259200 ( 422, 198, 1) ( 203, 186, 1) ( 47, 169, 1) ( 224, 171, 1)
FIRA 96369 1.466549530029297E+02 -3.922052764892578E+01 1.4E+02 -3.037954139709473E+01 1.0E+00 -3.979958057403564E+00
96369 1.321966247558594E+02 -1.603044700622559E+01 1.084177169799805E+02 3.982142448425293E+00
259200 ( 156, 31, 1) ( 573, 178, 1) (
avg abs field values: 6.778244097051392E+01 rms diff: 1.4E+01 avg rel diff(npos): 4.6E-02
5.960437961084186E+01 avg decimal digits(ndif): 1.2 worst: 0.0

and a guide to this information is printed at the top of the file

NDIFFS ( dim1, dim2, dim3, dim4, ... )
ARRSIZ1 ( indx1, indx2, indx3, ... ) file 1
FIELD NVALID1 MAX1 MIN1 DIFFMAX VALUES RDIFMAX VALUES
NVALID2 MAX2 MIN2
ARRSIZ2 ( indx1, indx2, indx3, ...) file 2

The information content is identical to the information in analyze
mode with the following additions. Two additional lines are added
in the main body. Lines 4 and 5 are identical to line 3 and 2
respectively but are associated with file 2 instead of file 1.
In addition, the right hand side of lines 2, 3, and 4 contain
information about the maximum difference, the location and values
of the maximum difference, the relative difference and the location
and values of the maximum relative difference. The last two line
summarize some overall statistics including average absolute values
of the field on the two files, rms difference, average relative
difference, average number of digits that match, and the worst
case for the number of digits that match.

"avg rel diff" gives the average relative difference (sum of relative
differences normalized by the number of indices where both variables
have valid values). The denominator for each relative difference is
the MAX of the two values.

"avg decimal digits" is determined by: For each diff, determine the
number of digits that match (as -log10(rdiff(i)); add this to a
running sum; then normalize by the number of diffs (ignoring places
where the two variables are the same). For example, if there are 10
values, 8 of which match, one has a relative difference of 1e-3 and
one has a relative difference of 1e-5, then the avg decimal digits
will be 4.

"worst decimal digits" is simply log10(1/rdmax), where rdmax is the
max relative difference (in the above example, this would give 3).

At the end of the output file, a summary is presented that looks like

SUMMARY of cprnc:
A total number of 119 fields were compared
of which 83 had non-zero differences
and 17 had differences in fill patterns
and 2 had differences in dimension sizes
A total number of 10 fields could not be analyzed
A total number of 0 time-varying fields on file 1 were not found on file 2.
A total number of 0 time-constant fields on file 1 were not found on file 2.
A total number of 0 time-varying fields on file 2 were not found on file 1.
A total number of 0 time-constant fields on file 2 were not found on file 1.
diff_test: the two files seem to be DIFFERENT


This summarizes:
- the number of fields that were compared
- the number of fields that differed (not counting fields that differed
only in the fill pattern)
- the number of fields with differences in fill patterns
- the number of fields with differences in dimension sizes
- the number of fields that could not be analyzed
- the number of fields on one file but not the other
- for files with an unlimited (time) dimension, these counts are
broken down into time-varying fields (i.e., fields with an unlimited
dimension) and time-constant fields (i.e., fields without an
unlimited dimension)
- whether the files are IDENTICAL, DIFFERENT, or DIFFER only in their field lists
- Files are considered DIFFERENT if there are differences in the values, fill
patterns or dimension sizes of any variable
- Files are considered to "DIFFER only in their field lists" if matching
variables are all identical, but there are either fields on file1 that are
not on file2, or fields on file2 that are not on file1
- However, if the only difference in field lists is in the presence
or absence of time-constant fields on a file that has an unlimited
(time) dimension, the files are considered to be IDENTICAL, with
an extra message appended that notes this fact. (While not ideal,
this exception is needed so that exact restart tests pass despite
some time-constant fields being on the output files from one case
but not the other.)

Developers Guide:
-----------------

The tool works as follows.

Fields can be analyzed if they are int, float or double and
have between 0 and n dimensions

In general, fields that appear on both files are
compared. If they are sizes, no difference
statistics are computed and only a summary of the fields on
the files are presented. If fields only appear
on one file, those fields are analyzed.

The unlimited dimension is treated uniquely. In general, for files
that have a dimension named "time", the time axes are compared
and matching time values on the two files are compared one
timestep at a time. Time values that don't match are skipped.
To override the matching behaviour, use cprnc -m. In this mode,
timestamps are compared in indexical space. In analyze mode,
the fields are analyzed one timestamp at a time. In general,
if there is a "time" axis, it will be the outer-most loop in
the output analysis. In compare mode, fields with a time axis
and a timestamp that are not common between the two files are
ignored.

It is also possible to compare files that don't have an unlimited
dimension; in this case, the '-m' flag must be given.

0 comments on commit 062877f

Please sign in to comment.