detect instruments on import of pld1 files #14

callumrollo · 2021-10-22T09:38:06Z

First step to address Issue #3

This will be a pretty major rewrite. This commit improves seaexplorer.raw_to_rawnc so that it will detect the sensors. It assumes variables with a common prefix like FLNTU_ all belong to the same sensor. I think this is reasonable.

The only major deviation from the hardcoded version is that it treats NAV_ as just another sensor. Is this desired behaviour? I'm not 100 % clear in how NAV is treated in the current code.

This is gonna majorly fail the CI, as it only handles the nc import stage, I'll add more commits to fix the downstream processing. This will be a bit of a monster PR bit I'll try to keep it focused

@jklymak should I add some tests? I'm not sure what's the best way to make sure this doesn't break pre-existing capability

jklymak · 2021-10-22T09:46:38Z

I think we can break things and rewrite the downstream processing, but what does this end up producing? Do you have an example netcdf you can share?

We may want to have a quick zoom at some point to plan out what this should look like. i.e. how do we get from random sensors to science data with units etc, flexibly, reproducibly, and with minimal pain for the "end user".

callumrollo · 2021-10-22T09:54:37Z

I think the netcdf format you've made is very good. I'd like to keep that, but make the code flexible to handle other payloads. At the moment I envisage an end user supplying the raw data and some kind of metadata file, like your yaml, that specifies deployment info, sensor ids etc. pyglider should then take these and produce L0 netcdf files ready for ingestion by DACs and distribution through something like ERDDAP. Does that make sense?

A zoom call would be great. Would help me get oriented on the project before I start breaking things! Are there any other interested parties?

I've just started at VOTO where I have responsibility for processing a wide array of SeaExplorer payloads. I've asked my manager what data I can share outside the organisation, we have lots of different sensor combos to process.

callumrollo · 2021-10-22T14:58:24Z

This commit makes the netcdf merge sensor-agnostic. One more step along the way.

Currently testing this against the original dataset, and another seaexplorer with legato, flntu, arod and ad2cp sensors

callumrollo · 2021-10-22T16:18:26Z

This commit makes the timeseries function sensor agnostic. It depends on the glider having either a GPCTD or LEGATO sensor for temp/conductivity etc. If other CT sails are available for the SeaExplorer, this will need to be updated

callumrollo · 2021-10-22T16:42:47Z

With these changes, I was able to process a dataset from a Seaexplorer with a completely different set of sensors. It requires changes to the Deployment yaml, and the plotting is still hardcoded, but I think this is enough changes for one PR

richardsc · 2021-10-23T16:26:29Z

This is awesome. I'll try and test in the next day or two on some of my raw data.

I'd definitely be interested in a zoom to chat about directions, as I have increasing desire to have both the SX and slocum capabilities.

callumrollo · 2021-10-23T19:47:47Z

Great! Let me know if you hit issues. To process all the way though, you'll need to add your sensors to the deployment yaml. Without those it should at least make the whole mission per-sensor netcdfs though. I can send you the sample dataset I've been working with on Monday.

I'm free pretty much all next week for a zoom call. I'm on UK time, but happy to schedule it for my evening if that's easier for you two.

callumrollo · 2021-10-25T11:17:14Z

Here are the raw and processed files from the SX dataset I've been working on. Pretty happy with it, but there's something weird going on with salinity https://callumrollo.com/nextcloud/index.php/s/e5xMFcFc42sLsn9

jklymak · 2021-10-25T12:12:48Z

@callumrollo If possible, please add the examples to the GitHub repo for testing?

callumrollo · 2021-10-25T12:33:20Z

Surething, shall I duplicate the structure of example-seaexplorer? Something like:
pyglider
-- example-seaexplorer
-- example-voto-seaexplorer
------deploymentRealtime.yml
------realtime_rawnc
...

jklymak · 2021-10-25T12:39:39Z

I think that would be fine, or feel free to rename the original one as well, perhaps by serial number or something that indicates the sensor suite: seaexplorer-gpctd-anderaa-ecopuck isn't too bad.

callumrollo · 2021-10-25T14:35:58Z

Files added over in #18, feel free to pull that branch and try it out. It will require the changes made in this PR to run though

callumrollo · 2021-10-25T15:29:37Z

Expected at least one of those checks to fail tbh!

I think this is ready for review now. It's a bit of a monster, but you should be able to just pull this branch, run the script /example-seaexplorer-legato-flntu-arod-ad2cp/process_deploymentRealTime.py and check the output netcdfs/figures make sense. @richardsc or @jklymak would you mind reviewing this?

example-seaexplorer-legato-flntu-arod-ad2cp/process_deploymentRealTime.py

jklymak

Looks great. Maybe we should go back though and simplify the logic, because the seperate sensor streams were a hack anyway, and not integral to the processing. Sorry, should have noticed that earlier, but I haven't looked at this code for a while...

example-seaexplorer-legato-flntu-arod-ad2cp/process_deploymentRealTime.py

jklymak · 2021-10-27T13:26:27Z

pyglider/seaexplorer.py

-                            pldGPCTD = pldGPCTD.drop_vars(arod)
-                            if pldGPCTD.indexes["time"].size > min_samples_in_file:
-                                pldGPCTD.to_netcdf(fnout[:-3] + '_gpctd.nc', 'w',
+                            # Detect instruments in dataset


Ah, OK. This splitting was originally because I had trouble concatenating data sets in xarray. It turns out that there was some sort of locking problem. This was a dumb attempt to not have such monolithic netcdf files which helped somewhat. However, with the lock fixed, I'm not sure its need any more.

I wonder if we can get rid of this hack, and just process without the prefix at all? That greatly simplifies all this code and we don't need to assume anything about the carrier instrument for each sensor.

That sounds like a good simplification. I think a new PR should be opened for it though, this one is already pretty big

Sure we can do it that way as well if you prefer...

jklymak · 2021-10-27T13:27:45Z

pyglider/seaexplorer.py

+                os.remove(outpld[:-5] + f'_{sensor}.nc')
+            except:
+                pass
+            with xr.open_mfdataset(f'{indir}*pld*.{kind}.*{sensor}.nc', decode_times=False, parallel=False, lock=False,


it was this lock=False which was the magic incantation fo allow robust merging of files. I think there is some mode where the merge can fail, but I'd rather it fail than the whole process lock up, which is what it was doing before.

jklymak · 2021-10-27T13:28:29Z

pyglider/seaexplorer.py

+            with xr.open_mfdataset(f'{indir}*pld*.{kind}.*{sensor}.nc', decode_times=False, parallel=False, lock=False,
+                                   preprocess=_sort) as pld:
+                print(pld)
+                if sensor == 'AROD':


ideally coarsening like this (which just made the file far smaller) would be a sensor yaml parameter?

That would be better yeah. Maybe we should start a discussion on the format of the yaml? It will be a key control for automated processing and metadata inclusion

jklymak · 2021-10-27T13:29:00Z

pyglider/seaexplorer.py

+                if sensor == 'AROD':
+                    # this is the only one that's altered in the original code
+                    pld = pld.coarsen(time=8, boundary='trim').mean()
+                print('Writing')


probably want to use logging here. Of course that was just my old print statement ;-)

jklymak · 2021-10-27T13:29:29Z

pyglider/seaexplorer.py

+    # Loop through the sensor netcdfs
+    sensor_ncs = glob.glob(f'{indir}*{kind}p_*.nc')
+    sensors = {}
+    for sensor_path in sensor_ncs:


as above, I think we can get rid of these separate netcdfs altogether.

callumrollo · 2021-10-27T13:43:33Z

Looks like some fairly major refactoring is in order. I think we should get PR #24 merged first. That way we'll have test data to ensure that changes to the intermediates steps aren't affecting the final netcdfs.

I'd like to keep this PR focused on adding flexibility for other seaexplorer payloads, with the extra example added to test this. Refactoring of the way sensor data are handled will be a seperate PR.

jklymak · 2021-10-27T13:45:38Z

OK, well lets you this in as-is, and merge #24. Then you can maybe mess with he refactor which we can give a more thorough code review to. This didn't erase my hard drive or install a TOR server ;-)

jklymak

Feel free to merge when CI is done!

callumrollo · 2021-10-27T13:47:22Z

Thanks for the review! Going to add logging to the sensor section you highlighted then merge in. I'll save adding the bitcoin miner until we have some more users ;)

callumrollo · 2021-10-27T15:25:40Z

The new tests detected that the oxygen sensor coarsening section had not been hit, so the data saved to the netcdf changed. Useful proof of concept for it.Hopefully fixed now. Will merge once the CI is green

detect instruments on import of pld1 files

b323176

netcdf merge with arbitrary sensors

11fe730

callumrollo marked this pull request as draft October 22, 2021 14:58

make l0 timeseries sensor agnostic

0518958

callumrollo marked this pull request as ready for review October 22, 2021 16:42

callumrollo mentioned this pull request Oct 25, 2021

Salinity calculation factor 10 error #15

Closed

callumrollo added 2 commits October 25, 2021 15:32

add extra seaexplorer example

b18c7c8

add test of new dataset to CI

854c748

callumrollo mentioned this pull request Oct 25, 2021

Add new dataset #18

Closed

callumrollo added 4 commits October 25, 2021 15:51

Merge branch 'callum-patch-6' into callum-patch-5 adds test dataset

93b70e8

add plotting yaml info

e600525

correctly process salinity for legato

bf4e66c

low density contours to cope with the baltic

6924d41

callumrollo mentioned this pull request Oct 25, 2021

ENH-SEA021 #5

Closed

use {kind} to find merged ncs of raw data

338b382

jklymak reviewed Oct 27, 2021

View reviewed changes

example-seaexplorer-legato-flntu-arod-ad2cp/process_deploymentRealTime.py Outdated Show resolved Hide resolved

jklymak reviewed Oct 27, 2021

View reviewed changes

callumrollo added 2 commits October 27, 2021 14:34

fix comment

75a7919

fix another comment

ffed19d

jklymak approved these changes Oct 27, 2021

View reviewed changes

callumrollo added 3 commits October 27, 2021 14:48

add logging when merging sensor netcdfs

882b2cd

Merge branch 'main' into callum-patch-5

43b6957

correctly detect oxygen sensor for coarsening

1410dd3

callumrollo merged commit f359bbe into c-proof:main Oct 27, 2021

This was referenced Oct 27, 2021

Refactor to avoid creating individual instrument netcdfs #25

Closed

yaml to control sensor processing #26

Closed

callumrollo deleted the callum-patch-5 branch November 9, 2021 08:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detect instruments on import of pld1 files #14

detect instruments on import of pld1 files #14

callumrollo commented Oct 22, 2021

jklymak commented Oct 22, 2021

callumrollo commented Oct 22, 2021

callumrollo commented Oct 22, 2021

callumrollo commented Oct 22, 2021

callumrollo commented Oct 22, 2021

richardsc commented Oct 23, 2021

callumrollo commented Oct 23, 2021

callumrollo commented Oct 25, 2021

jklymak commented Oct 25, 2021

callumrollo commented Oct 25, 2021

jklymak commented Oct 25, 2021

callumrollo commented Oct 25, 2021

callumrollo commented Oct 25, 2021

jklymak left a comment

jklymak Oct 27, 2021

callumrollo Oct 27, 2021

jklymak Oct 27, 2021

jklymak Oct 27, 2021

jklymak Oct 27, 2021

callumrollo Oct 27, 2021

jklymak Oct 27, 2021

jklymak Oct 27, 2021

callumrollo commented Oct 27, 2021

jklymak commented Oct 27, 2021

jklymak left a comment

callumrollo commented Oct 27, 2021

callumrollo commented Oct 27, 2021

detect instruments on import of pld1 files #14

detect instruments on import of pld1 files #14

Conversation

callumrollo commented Oct 22, 2021

jklymak commented Oct 22, 2021

callumrollo commented Oct 22, 2021

callumrollo commented Oct 22, 2021

callumrollo commented Oct 22, 2021

callumrollo commented Oct 22, 2021

richardsc commented Oct 23, 2021

callumrollo commented Oct 23, 2021

callumrollo commented Oct 25, 2021

jklymak commented Oct 25, 2021

callumrollo commented Oct 25, 2021

jklymak commented Oct 25, 2021

callumrollo commented Oct 25, 2021

callumrollo commented Oct 25, 2021

jklymak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

callumrollo commented Oct 27, 2021

jklymak commented Oct 27, 2021

jklymak left a comment

Choose a reason for hiding this comment

callumrollo commented Oct 27, 2021

callumrollo commented Oct 27, 2021