The aim of this repo is to facilitate the launch of experiments assessing the impact of cyclogeostrophic corrections applied to geostrophic Sea Surface Currents (SSC) derived from Sea Surface Height (SSH) data.
Geostrophic and cyclogeostrophic currents are computed using the package jaxparrow
(docs).
The impact is evaluated by computing geostrophic and cyclogeostrophic differences between the:
- SSC, relative vorticity and Eddy Kinetic Energy (EKE),
- cyclogeostrophic imbalance,
- Euclidean distance between SSH derived SSC and drifters velocity.
Those results are stored as datasets and some plots are produced automatically.
One objective of this repo was to make the choice of the SSC inversion parameters and the input datasets
as flexible as possible, while keeping a clear track of the values used for each experiment.
For this it uses the hydra
and hydra-zen
libraries (docs).
Assuming conda
is installed, running:
./depencies.sh
will create a conda
environment and install the required dependencies.
Thanks to hydra-zen
, experiments are launched as a Python script.
Running:
python main.py --help
will provide hydra-zen
formatted details about the experiment parameters:
experiment_data (ExperimentData): Object representing the file system and structure where experiment descriptions and
outputs are saved. It can be a local filesystem, but also a s3 bucket.
ssh_data (SSHData): Object representing the input SSH data. It can be only a path to the data file(s) on a local
filesystem, but it can also describe an URL, a s3 bucket, or a Copernicus Marine Service dataset.
drifter_data (DrifterData): Object representing the input drifter data. It can be only a path to the data file(s) on a
local filesystem, but it can also describe an URL, a s3 bucket, or a clouddrift dataset.
ssh_rename (dict, optional): Dictionary mapping SSH dataset original to new variables names.
Defaults to None, which means no variables will be renamed.
drifter_rename (dict, optional): Dictionary mapping drifter dataset original to new variables names.
Defaults to None, which means no variables will be renamed.
ssh_preproc (Callable, optional): Function(s) applied to preprocess the SSH data.
Defaults to ssh_lon_to_180_180_preproc_conf, which means longitudes are converted from [0, 360] to
[-180, 180].
drifter_preproc (Callable, optional): Function(s) applied to preprocess the drifter data.
Defaults to drifter_default_preproc_conf, which means default preprocessing is applied.
spatial_extent (list[float, float, float, float], optional): Spatial domain bounding box as [lon0, lon1, lat0, lat1].
Defaults to None, which means no spatial restriction.
temporal_extent (list[str, str], optional): Temporal domain window as a pair of date strings [start, end].
Defaults to None, which means no temporal restriction.
cyclogeostrophy_fun (Callable, optional): Function for cyclogeostrophy computation with predefined parameters.
Defaults to `cyclogeostrophy_conf`, which implies the use of the default `jaxparrow.cyclogeostrophy` parameters.
bin_size (int, optional): Bin size in degrees (°) for error computations versus drifter data.
Defaults to 1°.
do_plot (bool, optional): Whether to automatically produce plots
Defaults to True.
do_plot_all_times (bool, optional): Whether to produce plots for each time step. Requires `do_plot=True`.
Defaults to False.
memory_per_device (int, optional): Available VRAM or RAM per device in gigabytes (GB).
Defaults to 30 GB.
logger_level (int, optional): Logging output level (e.g., `logging.DEBUG`, `logging.INFO`, etc...).
Defaults to `logging.DEBUG`.
See Examples for ready-to-use launch commands.
Experiments descriptions and results storage is represented by the class experiment.io.experiement.ExperimentData
.
Its constructor takes as arguments:
- a
FileSystem
object, - a
string
indicating the path on this filesystem to the root directory of the experiment outputs.
How to access input datasets is described by the class experiment.io._resource.Resource
and its subclasses.
The subclasses allow specifying where the resource is accessed: on a local filesystem (LocalResource
),
on the "cloud" (URLResource
), on a s3 bucket (S3Resource
), or through a specific provider
(Copernicus Marine Service: CopernicusResource
or CloudDrift: CloudDriftResource
).
LocalResource
and URLResource
take as argument a single string
representing the path
(or the URL)
to the dataset(s).
S3Resource
also takes a path
as argument, plus a S3FileSystem
object allowing to access the s3 bucket.
CopernicusResource
does not have a path
argument, but several string
describing the targeted dataset
and how to connect to the Copernicus Marine Service datastore.
See details in Pointing to input or output data.
Finally, a filesystem is represented by the experiment.io._filesystem.FileSystem
class and subclasses:
LocalFileSystem
for local filesystem, and S3FileSystem
for s3 buckets.
Local filesystems objects constructor do not have any argument, while the S3FileSystem
constructor takes several
arguments to authenticate and connect to the bucket (see details in Accessing s3 buckets).
Beside creating yaml configuration files for the experiments, hydra-zen
allows to specify custom values for the
objects passed as arguments to the script with the syntax object_group=entry_name
,
but also to their parameters with object_group.arg=value
(not limited to one tree level).
For example, using the local
configuration (which is the default) for experiment_data
, ssh_data
and drifter_data
can be done with:
python main.py experiment_data=local ssh_data=local drifter_data=local
And prescribing the local path
of the ssh_data
or drifter_data
resource
is achieved with:
python main.py experiment_data=local ssh_data=local drifter_data=local
ssh_data.resource.path='fullpath_to_ssh_file.zarr' drifter_data.resource.path='fullpath_to_drifter_file.zarr'
It is also possible to access data from URLs as if the resource was stored locally:
python main.py experiment_data=local ssh_data=local drifter_data=local
ssh_data.resource.path='https://url_to_ssh_file.zarr' drifter_data.resource.path='fullpath_to_drifter_file.zarr'
s3 buckets can be read and write by using the s3
entry of the experiment_data
, ssh_data
and drifter_data
groups:
python main.py experiment_data=s3 ssh_data=s3 drifter_data=s3
Public buckets can be accessed by setting the anon
argument to the S3FileSystem
constructor to True
using:
python main.py ssh_data=s3 drifter_data=s3 ssh_data.resource.s3_fs.anon=True drifter_data.resource.s3_fs.anon=True
For private buckets (anon=False
, which is the default) the application assumes that the credentials used are stored
in environment variables.
Again, the key of those environment variables can be changed using for example:
python main.py experiment_data=s3 experiment_data.filesystem.anon=False
experiment_data.filesystem.key_env_var='<YOUR_KEY_ENV_VAR_NAME>'
s3 buckets have endpoint
(minio.lab.dive.edito.eu
for EDITO s3 for example) that should also be specified,
either from the command line:
python main.py experiment_data=s3 experiment_data.filesystem.anon=False
experiment_data.filesystem.endpoint='<YOUR_S3_ENDPOINT>'
Or from an environment variable, named AWS_S3_ENDPOINT
by default, but that you can override:
python main.py experiment_data=s3 experiment_data.filesystem.anon=False
experiment_data.filesystem.endpoint_env_var='<YOUR_ENDPOINT_ENV_VAR_NAME>'
See the S3FileSystem
class definition in the module experiment.io._filesystem.py
for details
(as script parameters and directly linked to constructors arguments).
ssh_data
can be read using copernicusmarine
API:
python main.py ssh_data=copernicus ssh_data.resource.cms_dataset_id='<THE_DATASET_ID>'
Python objects such as list
or dict
can be passed as:
python main.py ssh_data.resource.path='fullpath_to_ssh_file.zarr' drifter_data.resource.path='fullpath_to_drifter_file.zarr'
spatial_extent='[-60, -50, 25, 35]' temporal_extent='["2023-08-01", "2023-08-02"]'
The SSH and drifters data are both expected to be xarray
(docs) readable file(s)
such as zarr
or NetCDF
formats.
By using the wildcard *.nc
, several NetCDF
files can be opened at once.
SSH datasets are then processed assuming that:
- dimensions names are
time
,latitude
,longitude
, time
is expressed in (nano)seconds since epoch,latitude
andlongitude
are given in degrees,longitude
range from -180 to 180,- Absolute Dynamic Topography (ADT) and Sea Level Anomaly (SLA) are encoded in meters in the variables
adt
andsla
.
To meet those requirements, one can use the parameter ssh_rename
to rename the variables and dimensions, and add a
preprocessing step in the experiment.preproc
module.
Note that the longitudes are automatically projected to [-180, 180] if provided in [0, 360] by using the transform
(longitude + 180) % 360 - 180
which acts as the identity function if longitudes already range in [-180, 180].
Drifters data are assumed to follow clouddrift
(docs) ragged-array format,
meaning that they have:
traj
andobs
dimensions,- at least
lat
,lon
,time
,ve
andvn
variables, time
is expressed in (nano)seconds since epoch,lat
andlon
are given in degrees,lon
range from -180 to 180,ve
andvn
are in m/s,- for the default preprocessing steps to work, the variables
location_type
,typebuoy
anddrogue_status
must also be present.