From dc6fe2130c7d6c6ab62a97f8ebe7be889ff338c0 Mon Sep 17 00:00:00 2001 From: Janos Gabler Date: Tue, 25 Jun 2024 12:35:07 +0200 Subject: [PATCH] Add section on numerical differentiation. --- docs/source/development/eep-02-typing.md | 112 ++++++++++++++++++++++- 1 file changed, 111 insertions(+), 1 deletion(-) diff --git a/docs/source/development/eep-02-typing.md b/docs/source/development/eep-02-typing.md index 213f70d9e..bf0f446f9 100644 --- a/docs/source/development/eep-02-typing.md +++ b/docs/source/development/eep-02-typing.md @@ -1018,10 +1018,120 @@ class InternalOptimizeResult: #### Current situation -**Things we want to keep** **Problems** +The following proposal applies to the functions `first_derivative` and +`second_derivative`. Both functions have an interface that has grown over time and both +return a relatively complex result dictionary. There are several arguments that govern +which entries are stored in the result dictionary. + +The functions `first_derivative` and `second_derivative` allow params to be arbitrary +pytrees. They work for scalar and vector valued functions and a `key` argument makes +sure that they work for `criterion` functions that return a dict containing +`"value", `"contributions", and `"root_contributions"`. + +In contrast to optimization, all pytree handling (for params and function outputs) is +mixed with the calculation of the numerical derivatives. This can produce more +informative error messages and save some memory. However it increases complexity +extremely because we can make very few assumptions on types. There are many if +conditions to deal with this situation. + +The interface is further complicated by supporting Richardson Extrapolation. This +feature was inspired by [numdifftools](https://numdifftools.readthedocs.io/en/latest/) +but has not produced convincing results in benchmarks. + +**Things we want to keep** + +- `params` and function values can be pytrees +- support for estimagic `criterion` functions (now functions that return + `CriterionValue`) +- Many optional arguments to influence the details of the numerical differentiation +- Rich output format that helps to get insights on the precision of the numerical + differentiation +- Ability to optionally pass in a function evaluation at `params` or return a function + evaluation at `params` + +**Problems** + +- We can make no assumptions on types inside the function because pytree handling is + mixed with calculations +- Support for Richardson extrapolation complicates the interface but has not been + convincing in benchmarks +- Pytree handling is acatually incomplete (`base_steps`, `mi@` and `step_ratio` are + assumed to be flat numpy arrays) +- Many users expect the output of a function for numerical differentiation to be just + the gradient, jacobian or hessian, not a more complex result object. #### Proposal +##### Separation of calculations and pytree handling + +As in numerical optimization, we should implement the core functionality for first and +second derivative for functions that map from 1-Dimensional numpy arrays to +1-Dimensional numpy arrays. All pytree handling or other handling of function outputs +(e.g. functions that return a `CriterionValue`) should be done outside of the core +functions. + +##### Deprecate Richardson Extrapolation (and prepare alternatives) + +The goal of implementing Richardson Extrapolation was to get more precise estimates of +numerical derivatives when it is hard to find an optimal step size. Example use-cases we +had in mind were: + +- Optimization of a function that is piecewise flat, e.g. the likelihood function of a + naively implemented multinomial probit +- Optimization or standard error estimation of slightly noisy functions, e.g. functions + of an MSM estimation problem +- Standard error estimation of wiggly functions where the slope and curvature at the + minimum does not yield reasonable standard errors and confidence intervals + +Unfortunately, our practical experience with Richardson Extrapolation was not positive +and it seems that Richardson extrapolation is not designed for our use-cases (it is +designed as a sequence acceleration method that reduces roundoff error while shrinking a +step size to zero, whereas in our application it might often be better to take a larger +step size or aggregate derivative estimates from multiple step sizes in a different way +than Richardson Extrapolation does). + +We therefore propose to remove Richardson extrapolation and open an Issue to work on +alternatives. Examples for alternatives could be: + +- [Moré and Wild (2010)](https://www.mcs.anl.gov/papers/P1785.pdf) propose an approach + to calculate optimal step sizes for finite difference differentiation of noisy + functions +- We could think about aggregating derivative estimates at multiple step sizes in a way + that produces worst case standard errors and confidence intervals +- ... + +```{note} +Richardson extrapolation was only completed for first derivatives, even though it is +already prepared in the interface for second derivatives. +``` + +##### Better `NumdiffResult` object + +The result dictionary will be replaced by a `NumdiffResult` object. All arguments that +govern which results are stored will be removed. If some of the formerly optional +results require extra computation that we wanted to avoid by making them optional, they +can be properties or methods of the result object. + +##### Jax inspired high-level interfaces + +Since our `first_derivative` and `second_derivative` functions need to fulfill very +specific requirements for use during optimization, they need to return a complex result +object. However, this can be annoying in simple situations where users just want a +gradient, jacobian or hessian. + +To cover these simple situations and provide a high level interface to our numdiff +functions, we can provide a set of jax inspired decorators: + +- `@grad` +- `@value_and_grad` +- `@jac` (no distinction between `@jacrev` and `jacfwd` necessary) +- `@value_and_jac` +- `@hessian` +- `@value_and_hessian` + +All of these will be very simple wrappers around `first_derivative` and +`second_derivative` with very low implementation and maintenance costs. + ## Benchmarking ### Benchmark problems