-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaN density for transformed distributions when value is outside of support region #97
Comments
Definitely would prefer (2), while (1) is easier I think it could cause some unexpected issues with other distributions down the line. |
Sure ;) |
Something that could help with this might also be to allow the user to supply the derivatives of the transformation so that it does not need to be done with numerical differentiation. In some situations this will also already be available; e.g link functions defined by |
I thought about the same as well. Either allow the user to supply the derivative function, or have a lookup table (or symbolic differentiation) similar for the inverse (or both, if the latter fails). But by itself it' still not sufficient. For example, exp(dist_wrap('norm')) has a jacobian of 1/x. But the other problem is that then the inverse is log(x), and this gets calculated as well in the main density function, resulting in NaNs for negative values. So an explicit correction for the bounds is actually necessary. The symbolic derivatives will help still for some problematic behavior near the bounds. |
In the PR I just posted it works quite well now. For example, this is the output based on library(ggplot2)
library(distributional)
library(ggdist)
dist <- exp(dist_student_t(10))
ggplot() +
stat_slab(aes(xdist = dist),
limits = c(0,10),
fill = NA,
color = 'red')
#> Warning messages:
#> 1: In .Primitive("log")(x) : NaNs produced
#> 2: In .Primitive("log")(x) : NaNs produced
#> 3: In .Primitive("log")(x) : NaNs produced
#> 4: In .Primitive("log")(x) : NaNs produced And this is from the PR: And integrate fails based on master: dens <- function(x) density(dist, x)[[1]]
integrate(dens, -Inf, Inf)
#> Error in integrate(dens, -Inf, Inf) : non-finite function value
#> In addition: There were 50 or more warnings (use warnings() to see the first 50) whereas now it works almost perfectly (though I suspect not perfectly 1 because of the still weird behavior very near 0 - numDeriv gives NaNs for input values like 0.00001): dens <- function(x) density(dist, x)[[1]]
integrate(dens, -Inf, Inf)
#> 0.9999998 with absolute error < 7.9e-05 |
Alright, I actually got the symbolic differentiation working completely for arbitrary levels of nested transformations. And it's a bit faster... newdist <- exp(dist_wrap('norm'))
x <- seq(0,10,0.01)
microbenchmark::microbenchmark(density(newdist, x, deriv_method = "symbolic"),
density(newdist, x, deriv_method = "numeric"))
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> density(newdist, x, deriv_method = "symbolic") 175.234 185.238 222.9215 221.4 246.041 476.707 100
#> density(newdist, x, deriv_method = "numeric") 28770.028 30157.468 31754.9719 30859.2 33455.303 43195.878 100 methods agree for simple cases > density(newdist, 1)
Using symbolic differentiation
[1] 0.3989423
> density(newdist, 1, deriv_method = "numeric")
Using numerical differentiation.
[1] 0.3989423 numeric fails for boundary conditions but symbolic works > density(newdist, 0.00001)
Using symbolic differentiation
[1] 6.585616e-25
> density(newdist, 0.00001, deriv_method = "numeric")
Using numerical differentiation.
[1] NaN arbitrary nesting of functions and numeric operators: > d2 <- expm1(exp(dist_gamma(1,1)^0.2))
> density(d2, 5, deriv_method = "symbolic")
Using symbolic differentiation
[1] 0.05029254
> density(d2, 5, deriv_method = "numeric")
Using numerical differentiation.
[1] 0.05029254
> ggplot() + stat_slab(aes(xdist = d2))
Using symbolic differentiation I will make a separate PR for this tomorrow after I test it a bit more |
Sweet! As part of this, would it be possible to allow the user to pass the derivatives if known? I believe that would be the last piece necessary for me to offload all logic for density corrections for scale transformations from ggdist to distributional... |
Looks great, thanks! I agree with @mjskay that optionally storing known derivatives to transformed distributions ( |
Yes, it already works to supply a user derivative function, to me that's also really important. BTW, great work on both of your packages! I knew of them before, but now decided to use them in a package we are building and the integration is seamless. Thanks for being open to these changes! |
I suggest review 98 first. The symbolic derives do build on that in the sense that I branched off that branch, but otherwise they would work as is even if based on master. So functionality is pretty much independent |
It is standard to return 0 for the density when the value is outside of the support region. For example:
However, the following produces NaN:
This is because the numeric calculation of the derivative produces a NaN value.
There could be two ways to solve this:
Quite easy, but I'm not sure if there might be cases where you wouldn't want to do it - cases for which NaN is produced for some other reason and it doesn't make sence to return 0. I can't think of any, but that doesn't mean there aren't
This can also work nicely and it will be more robust, since the support region is already calculated for the transformed distribution. However, currently the support function does not distinguish between open and closed ranges (assumed closed range)
For this to fully work, the support function should differentiate between open and closed intervals
This seems super minor, but it is currently causing me problems when trying to plot transformed distributions via ggdist
The text was updated successfully, but these errors were encountered: