Skip to content

Commit

Permalink
Improve docs for defining new model
Browse files Browse the repository at this point in the history
  • Loading branch information
patnr committed Feb 28, 2024
1 parent 2a99a4c commit 17dc643
Showing 1 changed file with 74 additions and 67 deletions.
141 changes: 74 additions & 67 deletions dapper/mods/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,82 +8,33 @@ of the experiment results.

## Defining your own model

Follow the example of one of the models within the `dapper/mods` folder.
Essentially, you just need to define all of the attributes of a
`dapper.mods.HiddenMarkovModel`.
To make sure this is working, we suggest the following structure:

- Make a directory: `my_model`

- Make a file: `my_model/__init__.py` where you define the core
workings of the model.
- Typically, this culminates in a `step(x, t, dt)` function,
which defines the dynamical model/system mapping the state `x`
from one time `t` to another `t + dt`.
This model "operator" must support 2D-array (i.e. ensemble)
and 1D-array (single realization) input, and return output of the same
number of dimensions (as the input).
See

- `dapper.mods.Lorenz63`: use of `ens_compatible`.
- `dapper.mods.Lorenz96`: use of relatively clever slice notation.
- `dapper.mods.LorenzUV`: use of cleverer slice notation: `...` (ellipsis).
Consider pre-defining the slices like so:

iiX = (..., slice(None, Nx))
iiP = (..., slice(Nx, None))

to abbreviate the indexing elsewhere.

- `dapper.mods.QG`: use of parallelized for loop (map).

- Optional: define a suggested/example initial state, `x0`.
This facilitates the specification of initial conditions for different synthetic
experiments, as random variables centred on `x0`. It is also a
convenient way just to specify the system size as `len(x0)`. In many
experiments, the specific value of `x0` does not matter, because most
systems are chaotic, and the average of the stats are computed only for
`time > BurnIn > 0`, which will not depend on `x0` if the experiment is
long enough. Nevertheless, it's often convenient to pre-define a point
on the attractor, or basin, or at least ensure "physicality", for
quicker spin-up (burn-in).

- Optional: define a number called `Tplot` which defines
the (sliding) time window used by the liveplotting of diagnostics.

- Optional: To use the (extended) Kalman filter, or 4D-Var,
you will need to define the model linearization, typically called `dstep_dx`.
Note: this only needs to support 1D input (single realization).

- Most models are defined using a procedural and function-based style.
However, `dapper.mods.LorenzUV` and `dapper.mods.QG` use OOP.
This is more flexible & robust, and better suited when different
control-variable settings are to be investigated.
Below is a sugested structure you can use to define it.

.. note::
In parameter estimation problems, the parameters are treated as input
variables to the "forward model". This does not necessarily require
OOP. See `examples/param_estim.py`.
- Make a directory: `my_model`. It does not have to reside within the `dapper/mods` folder,
but make sure to look into some of the other dirs thereunder as examples,
for example `dapper/mods/DoublePendulum`.

- Make a file: `my_model/demo.py` to visually showcase
a simulation of the model, and verify it's working.
- Make a file: `my_model/__init__.py` to hold the core workings of the model.
Further details are given [below](#details-on-my_model__init__py), but the
main work lies in defining a `step(x, t, dt)` function
(you can name it however you like, but `step` is the convention),
to implement the dynamical model/system mapping the state `x`
from one time `t` to another `t + dt`.

.. note::
To begin with, test whether the model works on 1 realization,
before running it with several (simultaneously).
Also, start with a small integration time step,
before using more efficient/adventurous time steps.
Note that the time step might need to be shorter in assimilation,
because it may cause instabilities.
- Make a file: `my_model/demo.py` to run `step` and visually showcase
a simulation of the model, and verify it's working.

- Ideally, both `my_model/__init__.py` and `my_model/demo.py`
do not rely on components of DAPPER outside of `dapper.mods`.
.. note::
For the sake of modularity,
both `my_model/__init__.py` and `my_model/demo.py`
should ideally not rely on components of DAPPER outside of `dapper.mods`.

- Make a file: `my_model/my_settings_1.py` that defines
(or "configures", since there is usually little actual programming taking place)
a complete Hidden Markov Model ready for a synthetic experiment
(or "configures", since there is usually little programming logic and flow taking place)
a complete `dapper.mods.HiddenMarkovModel` ready for a synthetic experiment
(also called "twin experiment" or OSSE).
See `dapper.mods.HiddenMarkovModel` for details on what this requires.
Each existing model comes with several examples of model settings from the literature.
See, for example, `dapper.mods.Lorenz63.sakov2012`.

Expand All @@ -108,3 +59,59 @@ To make sure this is working, we suggest the following structure:
you obtained for those methods. You will find plenty of examples already in
DAPPER, used for cross-referenced with literature to verify the workings of DAPPER
(and the reproducibility of publications).


### Details on `my_model/__init__.py`

- The `step` function must support 2D-array (i.e. ensemble)
and 1D-array (single realization) input, and return output of the same
number of dimensions (as the input).
See

- `dapper.mods.Lorenz63`: use of `ens_compatible`.
- `dapper.mods.Lorenz96`: use of relatively clever slice notation.
- `dapper.mods.LorenzUV`: use of cleverer slice notation: `...` (ellipsis).
Consider pre-defining the slices like so:

iiX = (..., slice(None, Nx))
iiP = (..., slice(Nx, None))

to abbreviate the indexing elsewhere.

- `dapper.mods.QG`: use of parallelized for loop (map).

.. note::
To begin with, test whether the model works on 1 realization,
before running it with several (simultaneously).
Also, start with a small integration time step,
before using more efficient/adventurous time steps.
Note that the time step might need to be shorter in assimilation,
because it may cause instabilities.

.. note::
Most models are defined using a procedural and function-based style.
However, `dapper.mods.LorenzUV` and `dapper.mods.QG` use OOP,
which is perhaps more robust when different
control-variable settings are to be investigated.

In parameter estimation problems, the parameters are treated as input
variables to the "forward model". This does not *necessarily* require OOP.
See `examples/param_estim.py`.

- Optional: define a suggested/example initial state, `x0`.
This facilitates the specification of initial conditions for different synthetic
experiments, as random variables centred on `x0`. It is also a
convenient way just to specify the system size as `len(x0)`. In many
experiments, the specific value of `x0` does not matter, because most
systems are chaotic, and the average of the stats are computed only for
`time > BurnIn > 0`, which will not depend on `x0` if the experiment is
long enough. Nevertheless, it's often convenient to pre-define a point
on the attractor, or basin, or at least ensure "physicality", for
quicker spin-up (burn-in).

- Optional: define a number called `Tplot` which defines
the (sliding) time window used by the liveplotting of diagnostics.

- Optional: To use the (extended) Kalman filter, or 4D-Var,
you will need to define the model linearization, typically called `dstep_dx`.
Note: this only needs to support 1D input (single realization).

0 comments on commit 17dc643

Please sign in to comment.