R/Part_6_Statistical_Modelling_SpO2.qmd

---
title: "Preoperative Atelectasis"
subtitle: "Part 6: Statistical Modelling of SpO2"
author: "Javier Mancilla Galindo"
date: "`r Sys.Date()`"
execute: 
  echo: false
  warning: false
toc: true
toc_float: true
format:
  html:
    embed-resources: true
  pdf:
    documentclass: scrartcl
editor: visual
---

\pagebreak

# Setup

#### Packages used

```{r}
#| echo: true
if (!require("pacman", quietly = TRUE)) {
  install.packages("pacman")
}


pacman::p_load(
  tidyverse, # Used for basic data handling and visualization.
  RColorBrewer, #Color palettes for data visualization. 
  table1, #Used to add lables to variables.
  gridExtra, #Used to arrange multiple ggplots in a grid.  
  grid, #Used to arrange multiple ggplots in a grid.
  CBPS, #Used to calculate non-parametric propensity scores for IPW.  
  WeightIt, #Used to calculate weights from propensity scores for IPW.   
  mgcv, #Used to model non-linear relationships with a general additive model. 
  gt, #Used to present a summary of the results of tables. 
  gratia, #Used together with gglopt2 to create smooth partial effects plot 
        # from gam models.
  metR, # Used to plot predictions of SpO2.    
  report #Used to cite packages used in this session.
)
```

##### Session and package dependencies

```{r}
# Credits chunk of code: Alex Bossers, Utrecht University (a.bossers@uu.nl)

session <- sessionInfo()
# remove clutter
session$BLAS <- NULL
session$LAPACK <- NULL
session$loadedOnly <- NULL
# write log file
writeLines(
  capture.output(print(session, locale = FALSE)),
  paste0("sessions/",lubridate::today(), "_session_Part_6.txt")
)

session
```

```{r}
#| include: false

# Create directories for sub-folders 
figfolder <- "../results/output_figures"
tabfolder <- "../results/output_tables"
psfolder <- "../data/processed"
dir.create(figfolder, showWarnings = FALSE)
dir.create(tabfolder, showWarnings = FALSE)
dir.create(psfolder, showWarnings = FALSE)
```

```{r}
#| include: false

# Load dataset  
data <- read.csv("../data/processed/atelectasis_included.csv",
                 na.strings="NA", 
                 row.names = NULL)
# Recode variables 
source("scripts/variable_names.R")

# Recreate variables created in: 
## Part 2 (altitude category) and ## Part 4 (collapsed atelectasis percent): 
data <- data %>% 
  mutate(
    altitude_cat = cut(altitude,
                       breaks=c(0,1000,2500),
                       right=FALSE,
                       labels=c("Low altitude","Moderate altitude")
                       )
    )

data$atelectasis_percent_factor <- factor(
  data$atelectasis_percent, 
  levels=c(0,2.5,5,7.5,10,12.5,15,17.5,27.5)
  ) %>% 
  fct_collapse("17.5%" = c(17.5,27.5)) %>% 
  factor(labels = c("0%","2.5%","5%","7.5%","10%","12.5%","15%","17.5%"))

data$atelectasis_percent[data$atelectasis_percent==27.5] <- 17.5
```

\pagebreak

# Model SpO2

The SpO2 variable does not have a normal distribution. Furthermore, the distance between 1% increases in SpO2 cannot be considered equidistant increases since values are determined from the S-shaped curve of hemoglobin saturation. This is the reason why the distribution of SpO2 is negatively skewed, with upper values reaching the saturation point of the hemoglobin curve.

Therefore, modelling SpO2 as a linear term could be potentially misleading. Nonetheless, a model assuming a gaussian distribution for SpO2 may potentially be easier to understand and communicate.

Thus, I first created model SpO2 assuming a gaussian distribution and then applied a fractional regression model which is more appropriate for the distribution of this variable. The rationale for this was that if conclusions were not different with both models, presenting a model assuming a gaussian distribution would have been easier to understand and communicate. However, since conclusions were indeed different, I will present the results for the more appropriate fractional regression model.

As a last note, I first assessed the relationships between variables without removing any outliers. Examination of residuals showed that there were some influential outliers having an impact on the models. Thus, I decided to remove a total of 7 outliers only for the SpO2 models shown here (3 for the SpO2 \~ BMI relationship and 4 for SpO2 \~ atelectasis percent). This document presents the results of analyses after removing outliers. This code can be readapted to run all analyses without the removal of outliers.

## Fractional regression model

Convert SpO2 to fractional values between 0 and 1 to model.

```{r}
#| echo: true
data <- data %>% mutate(spo2_fraction = spo2_VPO/100)
```

#### Empty model

First, I will fit an empty model

```{r}
model<-gam(spo2_fraction~1,
           data=data, 
           family = quasibinomial(link = logit)
           )

R2_empty <- summary(model)$r.sq
dev_empty <- summary(model)$dev.expl
```

#### BMI smooth term and residuals

Model with a smooth BMI term as the only explanatory variable.

Since we are now using a different family function (quasibinomial with logit link) and we are no longer assuming a Gaussian distribution (which was done in figure 1 for an initial impression of the relationship between variables), it is important to determine the k value that offers the best representation of the change in the outcome variable with this function. I checked this by varying the value of k in the following code and ***k=8***\* offered the best visual representation with the largest increase in deviance explained and optimal k-index. While varying k, it will be noted that the highest edf occurs at k=8. This can be replicated by varying the value of k in the following code:

```{r}
model_BMI <- gam(spo2_fraction~s(BMI,k=8), 
                 data=data, 
                 family = quasibinomial(link = logit)
                 )

k.check(model_BMI)
```

```{r}
summary(model_BMI)
```

```{r}
plot(model_BMI)
```

```{r}
gam.check(model_BMI)
```

There are influential residuals. Will assess which of these could be removed according to Cook's distance.

```{r}
#| echo: true
data %>% 
  mutate(
  cooksd = cooks.distance(model_BMI),
  outlier = ifelse(cooksd < 4/nrow(data), "keep","delete")
) %>%
  filter(outlier=="delete") %>% 
  dplyr::select(ID,BMI,spo2_VPO,cooksd,outlier) %>% 
  arrange(desc(cooksd)) %>% 
  gt()
```

Now, I will examine atelectasis percentage:

#### Atelectasis percent smooth term and residuals

Using a smooth term for atelectasis percentage is comparable to having it as categorical, which can be checked by substituting the smooth term for the categorical term in the models. However, the smooth term will allow to have a visual representation of the partial effect of atelectasis percent on SpO2 and compare it to BMI, which is why I decided to keep the smooth term to model the effect of atelectasis.

I determined the optimal k value for atelectasis at k=5. A term with a lower k (i.e., k=3) lead to a decrease in explained deviance compared to the categorical term and was not a good representation of the trend in values for the variable, especially at the higher values as it curved upwards compared to the downward trend seen. Therefore, I kept k=5.

```{r}
model_atel_smooth <- gam(spo2_fraction ~ s(atelectasis_percent,k=5),
                         data=data,
                         family = quasibinomial(link = logit)
                         )
```

```{r}
plot(model_atel_smooth)
```

```{r}
summary(model_atel_smooth)
```

```{r}
gam.check(model_atel_smooth)
```

Outliers:

```{r}
#| echo: true
data %>% 
  mutate(
  cooksd = cooks.distance(model_atel_smooth),
  outlier = ifelse(cooksd < 4/nrow(data), "keep","delete")
) %>%
  filter(outlier=="delete") %>% 
  dplyr::select(ID,BMI,spo2_VPO,atelectasis_percent,cooksd,outlier) %>% 
  arrange(desc(cooksd)) %>% 
  gt()
```

I will remove influential outliers with cooks distance \>0.05 for BMI:

```{r}
#| include: false 
# Create backup before removing outliers 
data_original <- data

# Remove outliers: 
data <- data %>% filter(!ID %in% c("122","140","102"))
```

Likewise, remove influential outliers with cooks distance \>0.05 for atelectasis percent:

```{r}
#| include: false 
data <- data %>% filter(!ID %in% c("205","168","114","103"))  
```

\pagebreak

# Inverse probability weighting

In order to account for exposure-outcome confounding and also mediator-outcome confounding, inverse probability weighting is a good modelling option since the mediator-outcome confounding affected by the exposure induces collider-stratification bias. Thus, a propensity score will be calculated for both the exposure (BMI) and mediator (atelectasis percent). Inverse probability weights will be obtained from propensity scores and combined into a single weight that will be used to model SpO2.

## Propensity scores and weights

Non-Parametric Covariate Balancing Propensity Score (npcbps) will be obtained to avoid problems in the fitting of the models due to skewed distributions, thereby adopting a distribution-free method for obtaining the weights. Npcbps are directly interpretable as inverse probability weights ([see CBPS documentation](https://search.r-project.org/CRAN/refmans/CBPS/html/npCBPS.html)), reason why the additional step of obtaining inverse weights is not needed with this method.

Weights for exposure (BMI):

```{r}
#| echo: true 

data$weight1 <- weightit(BMI ~ age + sex + altitude_cat, 
                         data, 
                         method = "npcbps", 
                         over = FALSE)$weights
```

Weights for mediator (atelectasis percent):

```{r}
#| echo: true 

data$weight2 <- weightit(
  factor(atelectasis_percent, ordered = TRUE) ~ 
    BMI + age + sex + altitude_cat + asthma + sleep_apnea + COPD,
  data,
  method = "npcbps")$weights
```

Overall weight:

```{r}
#| echo: true 

data <- data %>% 
  mutate(
  weight = weight1*weight2
  )
```

Fit IPW model including both BMI and atelectasis percentage:

```{r}
model_plus<-gam(spo2_fraction ~ s(BMI, k=8) + 
                  s(atelectasis_percent,k=5),
                data = data, 
                weights = weight,
                family = quasibinomial(link = logit)
                )

R2_plus <- summary(model_plus)$r.sq
dev_plus <- summary(model_plus)$dev.expl
```

```{r}
summary(model_plus)
```

```{r}
#| include: true 
## Change 'false' for 'true' above to show plots.
plot(model_plus, all.terms=TRUE)
```

```{r}
gam.check(model_plus)
```

Very influential outlier.

```{r}
#| echo: true
data %>% 
  mutate(
  cooksd = cooks.distance(model_plus),
  outlier = ifelse(cooksd < 4/nrow(data), "keep","delete")
) %>%
  filter(outlier=="delete") %>% 
  dplyr::select(ID,BMI,spo2_VPO,atelectasis_percent,cooksd,outlier) %>% 
  arrange(desc(cooksd)) %>% 
  gt()
```

I will remove this extreme outlier:

```{r}
data <- data %>% filter(!ID %in% c("107"))
```

## BMI only model (unadjusted, unweighted)

```{r}
model_BMI <- gam(spo2_fraction~s(BMI,k=8),
                 data=data, 
                 family = quasibinomial(link = logit)
                 )

R2_BMI <- summary(model_BMI)$r.sq
dev_BMI <- summary(model_BMI)$dev.expl
```

```{r}
summary(model_BMI)
```

```{r}
plot(model_BMI)
```

```{r}
#| include: false 
## Change 'false' for 'true' above to show plots.
gam.check(model_BMI)
```

## Atelectasis percent model (unadjusted, unweighted)

```{r}
model_atel_smooth <- gam(spo2_fraction ~ s(atelectasis_percent,k=5),
                     data = data,
                     family = quasibinomial(link = logit)
                     )

R2_atel_smooth <- summary(model_atel_smooth)$r.sq
dev_atel_smooth <- summary(model_atel_smooth)$dev.expl
```

```{r}
plot(model_atel_smooth)
```

```{r}
summary(model_atel_smooth)
```

```{r}
#| include: false 
## Change 'false' for 'true' above to show plots.
gam.check(model_atel_smooth)
```

## s(BMI) + s(atelectasis percentage), unadjusted, unweighted

Fit model sBMI plus atelectasis percentage:

```{r}
model_atel<-gam(spo2_fraction ~ s(BMI,k=8) + 
                  s(atelectasis_percent,k=5),
                data = data, 
                family = quasibinomial(link = logit)
                )

R2_atel <- summary(model_atel)$r.sq
dev_atel <- summary(model_atel)$dev.expl
```

```{r}
summary(model_atel)
```

```{r}
plot(model_atel, all.terms=TRUE)
```

```{r}
#| include: false 
## Change 'false' for 'true' above to show plots.  
gam.check(model_atel)
```

## Adjusted model BMI

Fit IPW model for BMI. This model accounts for confounders relevant to BMI, after having obtained a balanced pseudopopulation through propensity weighting as detailed earlier.

```{r}
model_adj_BMI <-gam(spo2_fraction ~ s(BMI,k=8),
                    data = data,
                    weights = weight1,
                    family = quasibinomial(link = logit)
                    )

R2_adj_BMI <- summary(model_adj_BMI)$r.sq
dev_adj_BMI <- summary(model_adj_BMI)$dev.expl
```

```{r}
summary(model_adj_BMI)
```

```{r}
#| include: true 
## Change 'false' for 'true' above to show plots.
plot(model_adj_BMI)
```

```{r}
#| include: false 
## Change 'false' for 'true' above to show plots.
gam.check(model_adj_BMI)
```

## Adjusted model Atelectasis Percent

Fit IPS model for atelectasis percent:

```{r}
model_adj_atelectasis <-gam(spo2_fraction ~ s(atelectasis_percent,k=5),
                            data = data,
                            weights = weight2,
                            family = quasibinomial(link = logit),
                            )

R2_adj_atelectasis <- summary(model_adj_atelectasis)$r.sq
dev_adj_atelectasis <- summary(model_adj_atelectasis)$dev.expl
```

```{r}
summary(model_adj_atelectasis)
```

```{r}
plot(model_adj_atelectasis)
```

```{r}
#| include: false 
## Change 'false' for 'true' above to show plots.
gam.check(model_adj_atelectasis)
```

## Fully adjusted model (IPW model)

```{r}
model_plus<-gam(spo2_fraction ~ s(BMI, k=8) + 
                  s(atelectasis_percent,k=5),
                data = data, 
                weights = weight,
                family = quasibinomial(link = logit)
                )

R2_plus <- summary(model_plus)$r.sq
dev_plus <- summary(model_plus)$dev.expl
```

```{r}
summary(model_plus)
```

```{r}
#| include: true 
## Change 'false' for 'true' above to show plots.
plot(model_plus, all.terms=TRUE)
```

```{r}
#| include: true 
## Change 'false' for 'true' above to show plots.
gam.check(model_plus)
```

Note that there is complete separation of residuals, resembling oxygen categories. This is highly suggestive of this model not being good at explaining variation at higher oxygen levels (i.e. greater than 95%), whereas residuals are nearly randomly distributed at lower oxygen values. I will re-assess this hypothesis by the end of the document and in **Part 8**.

## Test for interaction

```{r}
# This is to explore the effect of a combined smooth term between 
# BMI and atelectasis percentage, adjusted for confounders.  
model_exp<-gam(spo2_fraction ~ s(BMI, atelectasis_percent), 
               weights = weight,
                data=data,
                family = quasibinomial(link = logit)
                )

R2_interact <- summary(model_exp)$r.sq
dev_interact <- summary(model_exp)$dev.expl

summary(model_exp)
```

```{r}
gam.check(model_exp)
```

```{r}
plot(model_exp, all.terms = TRUE)
```

```{r}
#| warning: true
vis.gam(model_exp,
        view=c("BMI","atelectasis_percent"),
        color = "gray",
        type = "response",
        theta = -45,
        plot.type = "persp",
        ylab = "Atelectasis percent",
        xlab = "Body mass index",
        zlab = "SpO2",
        main = "Predicted SpO2"
        )
```

Not much improvement from a model with an interaction term. This model with an interaction term is very likely fitting noise.

Build a dataframe to compare models:

```{r}
models <- data.frame(
  Model = c("empty",
            "sBMI",
            "atel_smooth",
            "sBMI_atel",
            "adjusted_BMI",
            "adjusted_atelectasis",
            "Fully_adjusted",
            "Interaction_adjusted"
            ),
  aR2 = c(R2_empty,
          R2_BMI,
          R2_atel_smooth,
          R2_atel,
          R2_adj_BMI,
          R2_adj_atelectasis,
          R2_plus,
          R2_interact
          ),
  dev = c(dev_empty,
          dev_BMI,
          dev_atel_smooth,
          dev_atel,
          dev_adj_BMI,
          dev_adj_atelectasis,
          dev_plus,
          dev_interact
          )
  )
```

Models sorted by explained deviance (from higher to lower):

```{r}
models <- models %>% mutate(aR2 = round(aR2,3)*100,
                            dev = round(dev,3)*100
                            )
models %>% arrange(desc(dev)) %>% gt()
```

```{r}
#| include: true 
## Change 'false' for 'true' above to show plots.

# Models sorted by adjusted R2    
models %>% arrange(desc(aR2)) %>% gt()
```

\pagebreak

# Figure SpO2 models

#### Figure 2a: Total effect of BMI (adjusted)

Assessment of residuals. This was done for all models.

```{r}
#| include: true 
## Change 'false' for 'true' above to show plots.

# Check residuals:   
draw(model_adj_BMI,residuals=TRUE) 
```

Now, take the inverse logit function to assess partial effect on mean SpO2.

```{r}
#| include: false 
## Change 'false' for 'true' above to show plots.

# Now, take the inverse logit function to assess partial effect on mean SpO2:   
draw(model_adj_BMI, 
     constant = coef(model_adj_BMI)[1], 
     fun = inv_link(model_adj_BMI)
     ) 
```

Partial effect on mean SpO2:

```{r, fig.width=8, fig.height=6}
#Draw a personalized plot:  
plot2a <- draw(model_adj_BMI, 
     constant = coef(model_adj_BMI)[1], 
     fun = inv_link(model_adj_BMI), 
     smooth_col = "cadetblue4"
     ) +
  scale_y_continuous(labels = scales::percent, limits=c(0.78,1)) +
  labs(x="Body mass index (kg/m²)", 
       y = "mean SpO2", 
       title = "Total effect of BMI", 
       subtitle=paste0("Deviance explained:"," ",(round(dev_adj_BMI,3)*100),"%"),
       tag="A",
       caption=NULL) + 
  theme_bw() +
  theme(panel.border = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.line = element_line(colour = "black"),
        axis.text.x = element_text(size=rel(1.2)), 
        axis.text.y = element_text(size=rel(1.2))
        )

plot2a 
```

#### Figure 2b: Fully adjusted BMI

```{r}
#| include: false 
## Change 'false' for 'true' above to show plots.

# Check residuals:   
draw(model_plus,residuals=TRUE, select = "s(BMI)") 
```

Partial effect on mean SpO2:

```{r, fig.width=8, fig.height=6}
plot2b <- draw(model_plus, select = "s(BMI)", 
     constant = coef(model_plus)[1], 
     fun = inv_link(model_plus), 
     smooth_col = "cadetblue4"
     ) +
  scale_y_continuous(labels = scales::percent, limits=c(0.78,1)) +
  labs(x="Body mass index (kg/m²)", 
       y = "mean SpO2", 
       title = "Direct effect of BMI", 
       subtitle=paste0("Deviance explained:"," ",(round(dev_plus,3)*100),"%"),
       tag="B",
       caption=NULL) + 
  theme_bw() +
  theme(panel.border = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.line = element_line(colour = "black"),
        axis.text.x = element_text(size=rel(1.2)), 
        axis.text.y = element_text(size=rel(1.2))
        )

plot2b 
```

#### Figure 2c: sAtelectasis percent

Check residuals:

```{r}
#| include: false 
## Change 'false' for 'true' above to show plots.

# Check residuals:   
draw(model_adj_atelectasis, residuals=TRUE) 
```

Draw a personalized plot:

```{r, fig.width=8, fig.height=6}
plot2c <- draw(model_adj_atelectasis, 
     constant = coef(model_adj_atelectasis)[1], 
     fun = inv_link(model_adj_atelectasis), 
     smooth_col = "black"
     ) +
  scale_y_continuous(labels = scales::percent, limits=c(0.78,1)) +
  labs(x="Atelectasis percent (%)", 
       y = "mean SpO2", 
       title = "Total effect of atelectasis", 
       subtitle=paste0("Deviance explained:"," ",(round(dev_adj_atelectasis,3)*100),"%"),
       tag="C",
       caption=NULL) + 
  theme_bw() +
  theme(panel.border = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.line = element_line(colour = "black"),
        axis.text.x = element_text(size=rel(1.2)), 
        axis.text.y = element_text(size=rel(1.2))
        )

plot2c 
```

#### Figure 2d: Fully adjusted Atelectasis Percent

```{r}
#| include: false 
## Change 'false' for 'true' above to show plots.

# Check residuals:   
draw(model_plus,residuals=TRUE, select = "s(atelectasis_percent)") 
```

Partial effect on mean SpO2:

```{r, fig.width=8, fig.height=6}
plot2d <- draw(model_plus, select = "s(atelectasis_percent)", 
     constant = coef(model_plus)[1], 
     fun = inv_link(model_plus), 
     smooth_col = "black"
     ) +
  scale_y_continuous(labels = scales::percent, limits=c(0.78,1)) +
  labs(x="Atelectasis percent (%)", 
       y = "mean SpO2", 
       title = "Indirect effect of BMI (mediated by atelectasis)", 
       subtitle=paste0("Deviance explained:"," ",(round(dev_plus,3)*100),"%"),
       tag="D",
       caption=NULL) + 
  theme_bw() +
  theme(panel.border = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.line = element_line(colour = "black"),
        axis.text.x = element_text(size=rel(1.2)), 
        axis.text.y = element_text(size=rel(1.2))
        )

plot2d 
```

#### Figure 2

```{r, fig.width=10, fig.height=8}
figure2 <- grid.arrange(plot2a, plot2b, plot2c, plot2d, nrow = 2)
```

```{r}
#| include: false
ggsave(
  "Figure2.pdf",
  plot = figure2,
  path = figfolder, 
  width = 10,
  height = 8, 
  units = "in", 
  dpi = 1200
  )
```

\pagebreak

# Predictions SpO2

These are the predicted SpO2 values in the fully adjusted model (adjusted_plus):

```{r}
#| warning: false
#| echo: true
vis.gam(model_plus,
        view=c("BMI","atelectasis_percent"),
        color = "gray",
        type = "response",
        plot.type = "contour",
        contour.col = brewer.pal(9, "BuGn"),
        nlevels=10,
        ylab = "Atelectasis percent (%)",
        xlab = "Body mass index (kg/m²)",
        main = "Predicted SpO2"
        )
```

This figure shows that this model is not able to predict SpO2 values above 96%. The range of predicted values of SpO2 that can be predicted with the data and model created in this study are within 88-96%. Lines correspond to a level of SpO2, so it can be seen that most of these are almost perpendicular to the y axis, meaning that most of the decrease in SpO2 is driven by increasing atelectasis percentage. Nonetheless, lines are not perfectly horizontal, which reflects that there is some residual effect of BMI on SpO2. Furthermore, this model shows that drops in SpO2 are more accentuated at the lower part of atelectasis percentage extension (93% to 96% mostly occur at atelectasis percentage lower than 5%). At SpO2 92% and lower, jumps are not as accentuated and there is a greater effect of increasing BMI as the lines tend to be more inclined. A 3D plot could perhaps allow to visualize these patterns if this is not clear enough from the 2D plot:

```{r}
#| warning: false
vis.gam(model_plus,
        view=c("BMI","atelectasis_percent"),
        color = "gray",
        type = "response",
        plot.type = "persp",
        theta= -45,
        ylab = "Atelectasis percent",
        xlab = "Body mass index",
        zlab = "SpO2",
        main = "Predicted SpO2"
        )
```

### Figure 3

The 2D plot was recreated with the accompanying sourced script ***Figure3.R*** which also saves the 3D plot as FigureS5.

```{r}
#| include: false
source("scripts/Figure3.R", local = knitr::knit_global())
```

```{r}
predictions_plot
```

```{r}
observed_plot
```

\pagebreak

# Modelling subsets of low vs high SpO2

As mentioned earlier, there is complete separation of residuals, resembling oxygen categories. Perhaps including these in the model could allow to know the effect of atelectasis and BMI on SpO2 according to different SpO2 categories:

```{r}
#| include: false
data <- data %>% 
  mutate(spo2_cat = cut(spo2_VPO,
                        breaks=c(87,95,100),
                        right=TRUE,
                        labels=c("≤95",">95")
                        ) %>% fct_rev()
         )
```

```{r}
model_plus_oxygen_categories <-gam(spo2_fraction ~ s(BMI, k=8) + 
                  s(atelectasis_percent,k=5) + spo2_cat,
                data = data, 
                weights = weight,
                family = quasibinomial(link = logit)
                )

R2_plus <- summary(model_plus_oxygen_categories)$r.sq
dev_plus <- summary(model_plus_oxygen_categories)$dev.expl
```

```{r}
summary(model_plus_oxygen_categories)
```

```{r}
#| include: false 
## Change 'false' for 'true' above to show plots.
plot(model_plus_oxygen_categories, all.terms=TRUE)
```

```{r}
#| include: true 
## Change 'false' for 'true' above to show plots.
gam.check(model_plus_oxygen_categories)
```

I will therefore model separately by splitting the dataset into participants with SpO2 lower than or equal to 95 vs those with SpO2 higher than 95. These analyses will be presented in **Part 8**.

```{r}
#Save dataset
write.csv(
  data,
  file = paste0(psfolder,"/atelectasis_processed.csv"),
  row.names=FALSE
  )
```

\pagebreak

# Package References

```{r}
#| include: false
report::cite_packages(session)
```

-   Auguie B (2017). *gridExtra: Miscellaneous Functions for "Grid" Graphics*. R package version 2.3, <https://CRAN.R-project.org/package=gridExtra>.
-   Bates D, Maechler M, Jagan M (2024). *Matrix: Sparse and Dense Matrix Classes and Methods*. R package version 1.6-5, <https://CRAN.R-project.org/package=Matrix>.
-   Campitelli E (2021). *metR: Tools for Easier Analysis of Meteorological Fields*. doi:10.5281/zenodo.2593516 <https://doi.org/10.5281/zenodo.2593516>, R package version 0.15.0, <https://eliocamp.github.io/metR/>.
-   Fong C, Ratkovic M, Imai K (2022). *CBPS: Covariate Balancing Propensity Score*. R package version 0.23, <https://CRAN.R-project.org/package=CBPS>.
-   Friedman J, Tibshirani R, Hastie T (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” *Journal of Statistical Software*, *33*(1), 1-22. doi:10.18637/jss.v033.i01 <https://doi.org/10.18637/jss.v033.i01>. Simon N, Friedman J, Tibshirani R, Hastie T (2011). “Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent.” *Journal of Statistical Software*, *39*(5), 1-13. doi:10.18637/jss.v039.i05 <https://doi.org/10.18637/jss.v039.i05>. Tay JK, Narasimhan B, Hastie T (2023). “Elastic Net Regularization Paths for All Generalized Linear Models.” *Journal of Statistical Software*, *106*(1), 1-31. doi:10.18637/jss.v106.i01 <https://doi.org/10.18637/jss.v106.i01>.
-   Gilbert P, Varadhan R (2019). *numDeriv: Accurate Numerical Derivatives*. R package version 2016.8-1.1, <https://CRAN.R-project.org/package=numDeriv>.
-   Greifer N (2024). *WeightIt: Weighting for Covariate Balance in Observational Studies*. R package version 1.0.0, <https://CRAN.R-project.org/package=WeightIt>.
-   Grolemund G, Wickham H (2011). “Dates and Times Made Easy with lubridate.” *Journal of Statistical Software*, *40*(3), 1-25. <https://www.jstatsoft.org/v40/i03/>.
-   Ho D, Imai K, King G, Stuart E (2011). “MatchIt: Nonparametric Preprocessing for Parametric Causal Inference.” *Journal of Statistical Software*, *42*(8), 1-28. doi:10.18637/jss.v042.i08 <https://doi.org/10.18637/jss.v042.i08>.
-   Iannone R, Cheng J, Schloerke B, Hughes E, Lauer A, Seo J (2024). *gt: Easily Create Presentation-Ready Display Tables*. R package version 0.10.1, <https://CRAN.R-project.org/package=gt>.
-   Makowski D, Lüdecke D, Patil I, Thériault R, Ben-Shachar M, Wiernik B (2023). “Automated Results Reporting as a Practical Tool to Improve Reproducibility and Methodological Best Practices Adoption.” *CRAN*. <https://easystats.github.io/report/>.
-   Müller K, Wickham H (2023). *tibble: Simple Data Frames*. R package version 3.2.1, <https://CRAN.R-project.org/package=tibble>.
-   Neuwirth E (2022). *RColorBrewer: ColorBrewer Palettes*. R package version 1.1-3, <https://CRAN.R-project.org/package=RColorBrewer>.
-   Pinheiro J, Bates D, R Core Team (2023). *nlme: Linear and Nonlinear Mixed Effects Models*. R package version 3.1-164, <https://CRAN.R-project.org/package=nlme>. Pinheiro JC, Bates DM (2000). *Mixed-Effects Models in S and S-PLUS*. Springer, New York. doi:10.1007/b98882 <https://doi.org/10.1007/b98882>.
-   R Core Team (2024). *R: A Language and Environment for Statistical Computing*. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>.
-   Rich B (2023). *table1: Tables of Descriptive Statistics in HTML*. R package version 1.4.3, <https://CRAN.R-project.org/package=table1>.
-   Rinker TW, Kurkiewicz D (2018). *pacman: Package Management for R*. version 0.5.0, <http://github.com/trinker/pacman>.
-   Simpson G (2024). *gratia: Graceful ggplot-Based Graphics and Other Functions for GAMs Fitted using mgcv*. R package version 0.8.2, <https://gavinsimpson.github.io/gratia/>.
-   Venables WN, Ripley BD (2002). *Modern Applied Statistics with S*, Fourth edition. Springer, New York. ISBN 0-387-95457-0, <https://www.stats.ox.ac.uk/pub/MASS4/>.
-   Venables WN, Ripley BD (2002). *Modern Applied Statistics with S*, Fourth edition. Springer, New York. ISBN 0-387-95457-0, <https://www.stats.ox.ac.uk/pub/MASS4/>.
-   Wickham H (2016). *ggplot2: Elegant Graphics for Data Analysis*. Springer-Verlag New York. ISBN 978-3-319-24277-4, <https://ggplot2.tidyverse.org>.
-   Wickham H (2023). *forcats: Tools for Working with Categorical Variables (Factors)*. R package version 1.0.0, <https://CRAN.R-project.org/package=forcats>.
-   Wickham H (2023). *stringr: Simple, Consistent Wrappers for Common String Operations*. R package version 1.5.1, <https://CRAN.R-project.org/package=stringr>.
-   Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” *Journal of Open Source Software*, *4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
-   Wickham H, François R, Henry L, Müller K, Vaughan D (2023). *dplyr: A Grammar of Data Manipulation*. R package version 1.1.4, <https://CRAN.R-project.org/package=dplyr>.
-   Wickham H, Henry L (2023). *purrr: Functional Programming Tools*. R package version 1.0.2, <https://CRAN.R-project.org/package=purrr>.
-   Wickham H, Hester J, Bryan J (2024). *readr: Read Rectangular Text Data*. R package version 2.1.5, <https://CRAN.R-project.org/package=readr>.
-   Wickham H, Vaughan D, Girlich M (2024). *tidyr: Tidy Messy Data*. R package version 1.3.1, <https://CRAN.R-project.org/package=tidyr>.
-   Wood SN (2011). “Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.” *Journal of the Royal Statistical Society (B)*, *73*(1), 3-36. Wood S, N., Pya, S"afken B (2016). “Smoothing parameter and model selection for general smooth models (with discussion).” *Journal of the American Statistical Association*, *111*, 1548-1575. Wood SN (2004). “Stable and efficient multiple smoothing parameter estimation for generalized additive models.” *Journal of the American Statistical Association*, *99*(467), 673-686. Wood S (2017). *Generalized Additive Models: An Introduction with R*, 2 edition. Chapman and Hall/CRC. Wood SN (2003). “Thin-plate regression splines.” *Journal of the Royal Statistical Society (B)*, *65*(1), 95-114.

```{r}
#| include: false

# Run this chunk if you wish to clear your environment and unload packages.

pacman::p_unload(negate = TRUE)

rm(data, df_pred, figfolder, tabfolder, figure2, Figure3, session)

rm(list=setdiff(ls(pattern = "plot"), lsf.str()))
rm(list=setdiff(ls(pattern = "^dev"), lsf.str()))
rm(list=setdiff(ls(pattern = "^model"), lsf.str()))
rm(list=setdiff(ls(pattern = "^R2"), lsf.str()))
rm(list=setdiff(ls(pattern = "^tab_"), lsf.str()))
```