Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in cf.write when writing UM data fields #838

Open
ellgil82 opened this issue Jan 7, 2025 · 12 comments
Open

Error in cf.write when writing UM data fields #838

ellgil82 opened this issue Jan 7, 2025 · 12 comments
Labels
question General question

Comments

@ellgil82
Copy link

ellgil82 commented Jan 7, 2025

Hi folks. I'm encountering issues again with processing UM data. It's a related task to ticket #817 but a different error so I figured I'd start a new topic.

I can load in a single file (either on a surface level, UM tiles, or pressure levels) with no problem and perform usual cf-python tasks like amending and adding attributes, but when it comes to writing the file I get an unhashable type error as follows:

>>> cf.write(f[0], 'test.nc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/apps/jasmin/jaspy/miniforge_envs/jaspy3.11/mf3-23.11.0-0/envs/jaspy3.11-mf3-23.11.0-0-v20240815/lib/python3.11/site-packages/cfdm/decorators.py", line 171, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/apps/jasmin/jaspy/miniforge_envs/jaspy3.11/mf3-23.11.0-0/envs/jaspy3.11-mf3-23.11.0-0-v20240815/lib/python3.11/site-packages/cf/read_write/write.py", line 808, in write
    netcdf.write(
  File "/apps/jasmin/jaspy/miniforge_envs/jaspy3.11/mf3-23.11.0-0/envs/jaspy3.11-mf3-23.11.0-0-v20240815/lib/python3.11/site-packages/cfdm/decorators.py", line 171, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/apps/jasmin/jaspy/miniforge_envs/jaspy3.11/mf3-23.11.0-0/envs/jaspy3.11-mf3-23.11.0-0-v20240815/lib/python3.11/site-packages/cfdm/read_write/netcdf/netcdfwrite.py", line 4911, in write
    self._file_io_iteration(
  File "/apps/jasmin/jaspy/miniforge_envs/jaspy3.11/mf3-23.11.0-0/envs/jaspy3.11-mf3-23.11.0-0-v20240815/lib/python3.11/site-packages/cfdm/read_write/netcdf/netcdfwrite.py", line 5153, in _file_io_iteration
    self._write_global_attributes(fields)
  File "/apps/jasmin/jaspy/miniforge_envs/jaspy3.11/mf3-23.11.0-0/envs/jaspy3.11-mf3-23.11.0-0-v20240815/lib/python3.11/site-packages/cfdm/read_write/netcdf/netcdfwrite.py", line 4289, in _write_global_attributes
    force_global = {
                   ^
  File "/apps/jasmin/jaspy/miniforge_envs/jaspy3.11/mf3-23.11.0-0/envs/jaspy3.11-mf3-23.11.0-0-v20240815/lib/python3.11/site-packages/cfdm/read_write/netcdf/netcdfwrite.py", line 4292, in <dictcomp>
    if len(v) == len(fields) and len(set(v)) == 1
                                     ^^^^^^
TypeError: unhashable type: 'dict'

I've managed to process hourly and daily files with this workflow and v similar code with no problem so far. I've also double-checked that all the files I'm currently trying to process here are fully formed and compliant, so that's not the issue (I think this was the main problem in my previous ticket).

I have also tested this behaviour with a MWE, where I load a UM file and then try to write it to netcdf, and the same occurs, i.e.

f = cf.read('/path/to/file/UM_file')
cf.write(f[0], 'test.nc')

Can you help me understand what's going on here? I am able to save the fields in iris without a problem, so it doesn't seem to be packing or underlying file problem, but I'd like to use cf-python to maintain consistency with the other files I've post-processed with this method.

@ellgil82 ellgil82 added the question General question label Jan 7, 2025
@sadielbartholomew
Copy link
Member

Hi Ella, sorry you've run into this - from the traceback it looks like a bug (in our upstream library cfdm), in which case you aren't doing anything wrong but we have something to fix. So we can gauge if this is true and if so it is still there for our latest main branch (we are due to do a new release in the next week or so with plenty of updates so might have caught it already), please can you run cf.environment(paths=False) so we know what versions of libraries you are using. Then we can try to suggest a workaround too.

@ellgil82
Copy link
Author

Hi Sadie, thanks for the response - nice to know it's not me doing something silly for once. Here's the output from cf.environment:

Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cf
>>> cf.environment(paths=False)
Platform: Linux-5.14.0-427.37.1.el9_4.x86_64-x86_64-with-glibc2.34
HDF5 library: 1.14.3
netcdf library: 4.9.2
udunits2 library: /apps/jasmin/jaspy/miniforge_envs/jaspy3.11/mf3-23.11.0-0/envs/jaspy3.11-mf3-23.11.0-0-v20240815/lib/libudunits2.so.0
esmpy/ESMF: 8.6.1
Python: 3.11.9
dask: 2024.8.0
netCDF4: 1.7.1
psutil: 6.0.0
packaging: 24.1
numpy: 1.26.4
scipy: 1.14.0
matplotlib: 3.8.4
cftime: 1.6.4
cfunits: 3.3.7
cfplot: 3.3.0
cfdm: 1.11.1.0
cf: 3.16.2

@sadielbartholomew
Copy link
Member

Excellent. thanks for the report. (Sorry for a bit of a delay in replying, we are really quite busy right now including preparing for these new releases, but if this is as suspected a bug the fix should go into the new releases.) Looking into it now and will report back today.

@sadielbartholomew
Copy link
Member

OK I have tried to recreate this with various data locally but I can't - as per the traceback, in the logic somehow a dictionary object is being placed into force_global which locally should not be allowed by cfdm but it looks like some uncompliant attributes in the data might be sneaking in.

Please can you run f[0].dump() just before the attempted write and share the output so that we can see exactly what the field looks like before you try to write it out? Hopefully then I can spot how the logic is allowing the dictionary through.

@ellgil82
Copy link
Author

Thanks for investigating this. This is the output of f[0].dump() when I read one of the files in and make no changes to the attributes:

/apps/jasmin/jaspy/miniforge_envs/jaspy3.11/mf3-23.11.0-0/envs/jaspy3.11-mf3-23.11.0-0-v20240815/lib/python3.11/site-packages/numpy/ma/core.py:467: RuntimeWarning: invalid value encountered in cast
  fill_value = np.array(fill_value, copy=False, dtype=ndtype)
-------------------------------------------------
Field: eastward_wind (ncvar%UM_m01s15i201_vn1300)
-------------------------------------------------
Conventions = 'CF-1.11'
_FillValue = -1073741824.0
history = 'Converted from UM/PP by cf-python v3.16.2'
lbproc = '0'
lbtim = '11'
long_name = 'U WIND ON PRESSURE LEVELS    B GRID'
runid = 'aaaaa'
source = 'UM vn1300'
standard_name = 'eastward_wind'
stash_code = '15201'
submodel = '1'
um_stash_source = 'm01s15i201'
units = 'm s-1'

Data(time(4), air_pressure(10), grid_latitude(561), grid_longitude(688)) = [[[[6.25, ..., 23.5]]]] m s-1

Cell Method: time(4): point

Domain Axis: air_pressure(10)
Domain Axis: grid_latitude(561)
Domain Axis: grid_longitude(688)
Domain Axis: time(4)

Dimension coordinate: time
    axis = 'T'
    calendar = 'gregorian'
    standard_name = 'time'
    units = 'days since 2016-1-1'
    Data(time(4)) = [2016-09-14 06:00:00, ..., 2016-09-15 00:00:00] gregorian

Dimension coordinate: air_pressure
    axis = 'Z'
    positive = 'down'
    standard_name = 'air_pressure'
    units = 'hPa'
    Data(air_pressure(10)) = [1000.0, ..., 200.0] hPa

Dimension coordinate: grid_latitude
    axis = 'Y'
    standard_name = 'grid_latitude'
    units = 'degrees'
    Data(grid_latitude(561)) = [-29.5, ..., 26.500000000000796] degrees
    Bounds:units = 'degrees'
    Bounds:Data(grid_latitude(561), 2) = [[-29.55, ..., 26.550000000000797]] degrees

Dimension coordinate: grid_longitude
    axis = 'X'
    standard_name = 'grid_longitude'
    units = 'degrees'
    Data(grid_longitude(688)) = [142.6, ..., 211.2999999999961] degrees
    Bounds:units = 'degrees'
    Bounds:Data(grid_longitude(688), 2) = [[142.54999999999998, ..., 211.3499999999961]] degrees

Auxiliary coordinate: latitude
    standard_name = 'latitude'
    units = 'degrees_north'
    Data(grid_latitude(561), grid_longitude(688)) = [[-47.02992113833496, ..., -46.293271723295106]] degrees_north
    Bounds:units = 'degrees_north'
    Bounds:Data(grid_latitude(561), grid_longitude(688), 4) = [[[-46.96827802683326, ..., -46.29282235157491]]] degrees_north

Auxiliary coordinate: longitude
    standard_name = 'longitude'
    units = 'degrees_east'
    Data(grid_latitude(561), grid_longitude(688)) = [[-109.14434740886605, ..., -297.7105530414625]] degrees_east
    Bounds:units = 'degrees_east'
    Bounds:Datagrid_latitude(561), grid_longitude(688), 4) = [[[-109.1800527113347, ..., -297.6134375065905]]] degrees_east

Coordinate reference: grid_mapping_name:rotated_latitude_longitude
    Coordinate conversion:grid_mapping_name = rotated_latitude_longitude
    Coordinate conversion:grid_north_pole_latitude = 5.0
    Coordinate conversion:grid_north_pole_longitude = 20.0
    Dimension Coordinate: grid_longitude
    Dimension Coordinate: grid_latitude
    Auxiliary Coordinate: longitude
    Auxiliary Coordinate: latitude

@sadielbartholomew
Copy link
Member

Thanks. Hmmm, everything looks fine there with that dump output. I see you are working on JASMIN - would you be able to share the (accessible) path on JASMIN of an example dataset which runs into this issue with me? That would be the quickest way for us to work out what's up here, given I'm struggling to recreate the issue which clearly only happens when certain conditions are met that I can't yet foresee (there's nothing in the code logic that showcases an obvious issue). Else share an example dataset by some other means - happy to take an email or if it is small enough, you could share here via comment?

@ellgil82
Copy link
Author

Oh that's annoying! Thanks for digging into it for me. I have put an example file in https://gws-access.jasmin.ac.uk/public/polarres/debug/

@sadielbartholomew
Copy link
Member

Ah perfect, thanks. OK so on the same environment (at least, version0wise) I can read that in and write it back out without issue, so this implies it is something you are doing to the file to process it which is leading to the issue, ultimately. Not to say that you are doing anything wrong per se, you could be doing the right thing and there's a bug in cf-python or cfdm whereby we should probably be handling something better, but as yet, not clear.

You mention that you have:

perform usual cf-python tasks like amending and adding attributes

so I wonder if somehow you are adding an attrbitute in a way where it ends up being non-CF-compliant, notably somehow a dictionary rather than a list? I remember now that we had a discussion over on the CMS Helpdesk about how to edit attributes (https://cms-helpdesk.ncas.ac.uk/t/renaming-attributes-coordinates-in-cf-python/1526) and that way of doing it should be fine, but obviously something is making the libraries unhappy.

Maybe you could trace through your processing code to the last part in which you can still do a successful write? Then we'll know what changes to the field are resulting in the issue? Alternatively if you are happy to send the script you are using I can find that out.

Sorry I can't be more helpful here - unless we can recreate an error it's hard to pinpoint what's going on.

@ellgil82
Copy link
Author

ellgil82 commented Jan 22, 2025

Sorry I didn't see this comment until now -

I would understand if there was something I'm doing to the file, but I get the same error if I just load the file and then immediately try to write it again, i.e.

f = cf.read('/path/to/file/UM_file')
cf.write(f[0], 'test.nc')

@sadielbartholomew
Copy link
Member

sadielbartholomew commented Jan 22, 2025

Is this definitely the file you have put at the link above, https://gws-access.jasmin.ac.uk/public/polarres/debug/? I'm a bit perplexed, because if I download that, and match your environment as best I can (same version of cf, cfdm, cftime, netCDF4-python, etc.) and I do the same read and immediate write, it all works. Please also check that the environment you are reporting is the one you are actually using, here.

I thought, since I'm struggling to recreate, I might as well get onto JASMIN and use JasPy to get the exact same environment (I just used the current with module load jaspy on a sci node) and it still works to write out as per your two-line snippet above. So, I suspect the file you mention where you get the error isn't actually the one at that path - else I'm stumped as to how you are seeing the error and I am not? If there's a direct path to a dataset on JASMIN I can use, that might solve it so I can see what you see. Thanks.

@ellgil82
Copy link
Author

Ok I'm not sure what's happening either, because I'm now unable to reproduce the error too... I will have a look at the specific attributes I'm adding in and see if there's something in particular causing the complaint.

Thanks for your help so far!

@sadielbartholomew
Copy link
Member

Ok I'm not sure what's happening either, because I'm now unable to reproduce the error too

That's good! Given this along with what I've observed, and if you're sure that is the same file you saw the error on another time, I suspect that when you got the error it was due to something bad that had happened to the environment which meant some other version of something snuck in and the incompatibility caused the problem.

Can you continue with what you were doing now, without seeing the issue? If so, I'm happy to go with the theory that somehow your environment was messed up somehow and that there isn't a problem with cf or cfdm, if you are. But, if you do encounter it again or aren't convinced we're resolved this do let us know...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question
Projects
None yet
Development

No branches or pull requests

2 participants