Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: Function to get distributional from family #131

Open
mathiaslj opened this issue Nov 21, 2024 · 6 comments
Open

feature request: Function to get distributional from family #131

mathiaslj opened this issue Nov 21, 2024 · 6 comments

Comments

@mathiaslj
Copy link

Hi!

I found your package and have been enjoying it!

In my use case, I would like to be able to get a distributional object corresponding to a specified family. I have written a small piece of code (see below) within my package to enable this, but I wondered if it wouldn't be of interest to have implemented in the package.

What do you think?

get_distributional <- function(object) {
  UseMethod("get_distributional")
}

get_distributional.default <- function(object) {
  switch(object,
         gaussian = getFromNamespace("dist_normal", ns = "distributional"),
         binomial = getFromNamespace("dist_binomial", ns = "distributional"),
         poisson = getFromNamespace("dist_poisson", ns = "distributional"),
         stop("Unknown family: ", family))
}

get_distributional.family <- function(object) {
  object <- object$family
  NextMethod("get_distributional")
}```
@mitchelloharawild
Copy link
Owner

Are you trying to obtain the function corresponding to a given family, or characterise a complete distribution?
I can see some value in having an as_distributional(family(), parameters()) function to restructure many distributions when loading them from a file.

How do you use the get_distributional() function in your packages?

@mathiaslj
Copy link
Author

Thanks for the quick response.

I was messing around with doing some likelihood calculation "manually" while doing some MCMC sampling in connection with some bayesian GLM. In that task I have a function with usual formula, data, family arguments and wanted to use my specification of family to determine the density I should use for the likelihood calculation without needing a bunch of ifs for each family.

However, for this use case I still need to reparametrise each density to accomodate the evaluation of it for my likelihood, meaning I think it will actually be easier to just revert to simple if checks. And if I do go this route, I might have to create custom functions for the reparametrisation anyway and would not have use of the as_distributional function.

So to answer your question: Yes, I am trying to characterise the distribution, but from a bit more pondering I don't think I would end up using the implementation for my current purpose. That said, an as_distributional S3 generic could probably come in handy in some scenarios, so it's up to you if you want to close this issue or leave it for some time in the future :)

@mitchelloharawild
Copy link
Owner

Perhaps some conversion from family() -> distributional would be useful then, although you would still need the predicted values (or generally the parameters) to parameterise the distribution.

I'm not quite sure how reparameterising each density conflicts with an as_distribution() function - could you elaborate on how your use case works with if checks but not with as_distributional?

@mathiaslj
Copy link
Author

Hmm I guess an as_distributional function with a family method could still be useful - just in conjunction with some "custom" code. By reparametrisation I mean that if I want to write some code for my log-likelihood calculation that should work for both fx. a guassian density and a binomial density, I need the function that fetches the density based on the family to "have the same first argument". Meaning that when I do my linear prediction eta and transform it to mu using the inverse link function, I want some pseudocode to illustrate the idea to look something like so:

log_likelihood <- function(formula, data, family, extra_args = list(prior_sigma) {
  # Doing stuff to prepare data and calculate my `mu`
  dist <- as_distributional(family, mu, extra_args)
  return(sum(density(dist, Y, log = T)))
}

And for this to work, the binomial density needs to take as first argument the prop parameter, so prop will be equal to mu in the distribution, the size parameter should be set to 1 (it's not possible to input a vector of values mu to the bernoulli density), and it should be able to take in something like extra_args to allow for additional arguments for other families (like an sd for the gaussian case). So even with as_distributional I would have to do something like so:

density_parametrised_by_mu_glm <- function(family) {
  switch(family$family,
         gaussian = function(main_parameter, ...)
           as_distributional(family, mean = main_parameter, ...),
         binomial = function(main_parameter, ...)
           as_distributional(family, prob = main_parameter, size = 1, ...)
  )
}

So that's my "issue" with having to "reparametrise the densities" even if there was an as_distributional function - though it could still help. I'm still a novice R programmer and package developer, so I might be missing a much more elegant solution to my problem 😅

@mitchelloharawild
Copy link
Owner

You might like to use distributional::log_likelihood(), which essentially does the same as your implementation (sum(density(dist, Y, log = T))).

As for handling size = 1 for the binomial family, I feel like this is better handled in your package. Very similar to your second function here, I would suggest writing a function that gives the appropriate distributional object (that can then be used with log_likelihood() and other computation methods).

library(distributional)
pred_dist <- function(family, eta) {
  # Compute the predicted mean using the inverse link function
  mu <- family$linkinv(eta)
  
  # Switch for density calculation based on family
  switch(family$family,
         "gaussian" = dist_normal(mean = mu, sd = ???),
         "binomial" = dist_binomial(size = 1, prob = mu),
         "poisson" = dist_poisson(lambda = mu),
         "inverse.gaussian" = dist_inverse_gaussian(mu, ???)
         stop(paste("Density calculation not implemented for family:", family$family))
  )
}

I leave it to you to handle the variances / dispersion / etc.

@mathiaslj
Copy link
Author

Ah okay, I didn't notice the existence of the log_likelihood function, I'll use that together with something like the above.

Thanks for engaging with my problem!

I'll leave it up to you whether to close the issue or keep it open in case it could be interesting to look into an implementation of as_distributional in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants