Skip to content

Add simultaneous intervals on smooths in draw.gam() #314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gavinsimpson opened this issue Sep 9, 2024 Discussed in #313 · 8 comments
Open

Add simultaneous intervals on smooths in draw.gam() #314

gavinsimpson opened this issue Sep 9, 2024 Discussed in #313 · 8 comments

Comments

@gavinsimpson
Copy link
Owner

Discussed in #313

Originally posted by AlainZuur September 6, 2024
Hello...what would be the easiest way to get simultaneous CIs with draw()? I can get them manually with confint...but it would be much easier if draw() could plot them (I don't think it can??).

I can see how to do it manually:
https://gavinsimpson.github.io/gratia/articles/custom-plotting.html
This part:
sm <- smooth_estimates(m) |> add_confint()
And then replace by: confint and adjust the type argument. Is there any way of doing this faster?

Or change that line in draw.gam?
Alain

@louisebarbet
Copy link

louisebarbet commented Mar 3, 2025

Hello. I am new to the package and together with @EliotFr5 and under the supervision of [@katossky / @JMeyer31] we are willing to solve this issue, as part of an assignment at the ENSAI school of statistical engineering (France). We have read the general guidelines from Github and we will try to be as autonomous as possible but we may ask for guidance from now and then. We
have a solid background in statistics and a reasonable one in computer science.
If any maintainer thinks this issue is too ambitious for a beginner, please tell us so and - if you have time - guide us to a project / issue were our help may be more useful.

@EliotFr5
Copy link

EliotFr5 commented Mar 13, 2025

To solve this problem, we have considered two possible solutions :

  • Modify the function draw.gam() to include an option simultaneous=TRUE/FALSE, allowing users to choose between pointwise and simultaneous confidence intervals when visualizing the graph.

  • Create a function add_confint.gam() to compute both pointwise and simultaneous confidence intervals for GAM, and use this new function in draw.gam(). (An advantage of this approach is that the function could be reused in other contexts)

Let us know what you think.
Thank you !

@gavinsimpson
Copy link
Owner Author

Hi @louisebarbet and @EliotFr5 Thanks for the interest in gratia and apologies for the slow reply (I've just finished a block of teaching).

I'm not sure this Issue is the best option for you to jump into the package with:

  1. draw.gam() evaluates the smooths the user wants to plot, but doesn't compute any intervals,
  2. smooth_estimates() does the actual evaluation of the smooths, but doesn't compute a confidence interval (arguably, it could do this, and I should revisit that decision),
  3. draw.smooth_estimates is where the current confidence interval is added, but it doesn't have access to the fitted model at the point where it is called.

Further, all of this needs an overhaul for efficiency reasons (the evaluation of smooths and creating the ggplot objects is embarrassingly parallel) and to deduplicate code.

Modify the function draw.gam() to include an option simultaneous=TRUE/FALSE, allowing users to choose between pointwise and simultaneous confidence intervals when visualizing the graph.

Wherever we add the simultaneous interval code, the argument should follow the convention elsewhere in the package, which is interval = c("confidence", "simultaneous") (It actually uses type for the argument name in most places, but that is too generic a word and we should switch to using interval but that requires that the type argument be soft-deprecated using the lifecycle package.

Create a function add_confint.gam() to compute both pointwise and simultaneous confidence intervals for GAM, and use this new function in draw.gam(). (An advantage of this approach is that the function could be reused in other contexts)

gratia already has confint.gam which computes these intervals. There's nothing to add an interval too, so creating and add_confint.gam() method would not be very useful - to be honest, add_confint only got created because I didn't include interval computation in smooth_estimates().

There is already code in the package to compute simultaneous intervals in two separate places (maybe three because of the now-deprecated fderiv()), so as much of that should be pulled into a function that can be called from multiple places. Anything that gets added to fix this issue, should use that code rather than add another instance of the code.

An easier issue to get stuck into, but which is related to #314, is to add simultaneous intervals to the fitted/predicted values - i.e. a simultaneous interval over two or more model terms. See #4 and #67, and I have had a few emails about this recently. Implementing this for fitted_values() would be much simpler, allow the existing code in gratia for simultaneous intervals to be abstracted into a reusable function, and address a genuine need.

If you want to continue with implementing the wish of #314, then it needs to be done in smooth_estimates() or something that gets called from smooth_estimates().

@louisebarbet
Copy link

louisebarbet commented Mar 25, 2025

Hi @gavinsimpson,

Thank you for your help ! @EliotFr5 and I will work on the issue you highlighted, specifically add simultaneous intervals to the fitted/predicted values.
We noticed that the function fit_vals_default() computes pointwise CI. We can modify it to compute simultaneous CI as follows :

# Bonferroni adjustment for k points
crit_simul <- coverage_normal(1 - (1 - ci_level) / k)
# CI
fit <- mutate(fit,
    ".lower_ci" = .data[[".fitted"]] - (crit_simul * .data[[".se"]]),
    ".upper_ci" = .data[[".fitted"]] + (crit_simul * .data[[".se"]])
)

This means we need to add a k attribute to the fit_vals_default() function, as well as to fitted_value(), if we’re not mistaken?

You mentioned existing code for computing simultaneous intervals. Would you like us to use the confint() function within fitted_value() or fit_vals_default() ?
We could modify fit_vals_default() as follows :

ci_simul <- confint(object, parm = "fitted", level = ci_level, type = "simultaneous")
fit <- mutate(fit,
    ".lower_ci" = ci_simul[["lower"]],
    ".upper_ci" = ci_simul[["upper"]]
)

Let us know what you think.
Thank you !

@gavinsimpson
Copy link
Owner Author

@louisebarbet:

A Bonferroni adjustment would be far to conservative as k could be in the order of hundreds or even thousands of points at which a smooth could be evaluated.

What is needed is an interval that includes ci_level*100 % of all curves. The method used is described in this blog post. Code already exists to compute these intervals for a single smooth, so the modification needed is to do it for all (or a subset if we allow the user to use exclude) the parameters in the model. Essentially, rather than subset the vector or coefficients and the rows and columns of the model's covariance matrix to those corresponding to a single smooth, we'd use the full set (or a subset of them if the user wants to use exclude, for example to remove a random effect/smooth term from the model.)

@EliotFr5
Copy link

EliotFr5 commented Mar 26, 2025

Hi @gavinsimpson,

Thank you for your clarifications. @louisebarbet and I read your blog post and found that the code computing these intervals for a single smooth is in the confint-methods file.
We can update the section handling simultaneous intervals in confint.gam() by replacing sim_interval with sim_interval_multi as follows:

sim_interval_multi <- function(smooths, level, data) {
      para.seq <- unlist(lapply(smooths, function(smooth) smooth[["first.para"]]:smooth[["last.para"]]))
      Cg <- do.call(cbind, lapply(smooths, PredictMat, data = data))
      simDev <- Cg %*% t(buDiff[, para.seq])
      absDev <- abs(sweep(simDev, 1L, data[[".se"]], FUN = "/"))
      masd <- apply(absDev, 2L, max)
      unname(quantile(masd, probs = level, type = 8))

And update the following section:

if (is.null(by_levs)) { # not by variable smooth
        selected_smooths <- lapply(uS, function(s) get_smooth(object, s)) # get the specific smooths
        crit <- sim_interval_multi(selected_smooths, level = level, data = bind_rows(out))
      } else { # is a by variable smooth
        selected_smooths <- lapply(uS, function(s) old_get_smooth(object, s))
        crit <- sim_interval_multi(selected_smooths, level = level, data = bind_rows(out))
      }

Let us know if this approach aligns with your expectations!

@gavinsimpson
Copy link
Owner Author

@EliotFr5 You don't need to modify anything in confint.gam. What needs modification is fitted_values.gam() and more specifically fit_vals_default() which is defined in R/fitted_values.R. fitted_values.gam() needs a new argument for the type of interval, which need to be passed along to fit_vals_default().

The lines is particular that need changing are:

gratia/R/fitted_values.R

Lines 129 to 132 in 10fef60

fit <- predict(object,
newdata = data, ..., type = "link",
se.fit = TRUE
) |>

If the type of interval is simultaneous, you will instead need to use type = "lpmatrix" in the predict() call to get the $X_p$ matrix, which is called $\mathbf{C_g}$ in that blog post, then use mvnfast::mvn() to create BUdiff, etc. The main difference between what is needed here and what is needed in the post is that the post just did this for a specific smooth, while in fitted_values() we are doing this for the entire model, so there is no subsetting of coefficients or the VCov matrix at all.

@gavinsimpson
Copy link
Owner Author

Sorry, what I meant to say was, if a simultaneous interval is requested, then in addition to fit (this is pred from the blog post) as given already in L129-132, you'll need to call predict(object, newdata = data, ..., type = "lpmatrix") to generate Cg from the post, and then BUdiff and thence simDev, an eventually you'll be able to compute a value for crit, which right now uses coverage_normal() but you'll need to make this conditional upon the type of interval too, something like

crit <- if (interval == "simultaneous") { # these next three lines are also from the blog post
  absDev <- abs(sweep(simDev, 1, se.fit, FUN = "/"))
  masd <- apply(absDev, 2L, max)
  quantile(masd, prob = 0.95, type = 8)
} else {
  coverage_normal(ci_level)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants