causal-inference-observational-data-matching.qmd

---
title: 'Causal inference with observational data'
subtitle: '**using Propensity score matching**'
author: "[Jaron Arbet]{style='color: steelblue;'}"
date: '`r Sys.Date()`'
date-format: short
format: 
  revealjs: 
    output-file: causal_inference_matching.html
scrollable: TRUE
slide-number: c/t
bibliography: references.bib
embed-resources: true
---

# Background

```{r, include = FALSE}
library(MatchIt);
library(cobalt);
library(BoutrosLab.plotting.general);
seed <- 1234;

source('utilities.R')
```

## Why?

RCTs are **expensive**:

-   Median cost of Phase 3 trials: \$19 million (IQR: \$12.2 - \$33.1M)
-   Avg. cost of bringing new drug to market \~ 1 - 3 billion
-   May not generalize to larger "real world" population

May not be practical or ethically feasible

::: aside
[@phase3-cost; @new-drug-cost]
:::

## Real World

**Real World Data [RWD]{style='color: steelblue;'}:**

- Observational, collected in "real world" setting
- EHR database, hospital visit notes, wearable devices
- Medical claims/billing database, disease registries

**Real World Evidence [RWE]{style='color: steelblue;'}:**

- Clinical evidence about the benefits/harms of medical products derived from analyzing RWD [@franklin2021-synthetic-trial]
- Poor quality RWD: garbage-in-garbage-out
- [90 examples](https://www.fda.gov/media/146258/download) of RWE used by FDA


## Potential outcomes causal framework

- **[2 exposure groups]{style='color: steelblue;'}**: e.g. Treatment vs Control
- How does the exposure affect a given outcome $Y$?
- The $i$th subject has **[2 potential outcomes]{style='color: steelblue;'}**: 
$$Y_i(T) \text{ and } Y_i(C)$$
- For each subject, the **[treatment effect]{style='color: steelblue;'}** is defined as:
    $$Y_i(T) - Y_i(C)$$
    + Only 1 of these is observed in reality
    
[@little2000-potential-outcomes; @rubin2005causal]
  
## Example

<!--# table made using https://www.tablesgenerator.com/markdown_tables -->


| Subject | $Y_i(T)$                        | $Y_i(C)$                        | Trt. Effect: $Y_i(T) - Y_i(C)$                |
|---------|------------------------------|------------------------------|------------------------------|
| 1       | 14                           | [?]{style='color: steelblue;'} | [?]{style='color: steelblue;'} |
| 2       | 9                            | [?]{style='color: steelblue;'} | [?]{style='color: steelblue;'} |
| 3       | 8                            | [?]{style='color: steelblue;'} | [?]{style='color: steelblue;'} |
| 4       | [?]{style='color: steelblue;'} | 5                            | [?]{style='color: steelblue;'} |
| 5       | [?]{style='color: steelblue;'} | 10                           | [?]{style='color: steelblue;'} |
| 6       | [?]{style='color: steelblue;'} | 7                            | [?]{style='color: steelblue;'} |
| **Mean**    | **10.33**                        | **7.33**                         | **3**                            |

- In RCT, $\tau = \bar{Y}_i(T) - \bar{Y}_i(C)$ estimates a *causal effect*

- In general, $\tau$ is not causal for observational studies (OS)
   + Many methods try to change this [@austin2011-intro-PS; @varga2023-confounding-observational]


## Propensity scores (PS)

> "Probability of treatment assignment based on
observed baseline covariates" [@original-PS-paper]

- Given treatment variable $X_i \in \{0,1\}$ and baseline covariates $\boldsymbol{Z}_i$, then estimate PS:

$$
PS_i = Pr(X_i = 1) = f(\boldsymbol{Z}_i) + \epsilon_i
$${#eq-PS}

<br>
Logistic regression or machine learning to estimate @eq-PS
    
## Why model treatment assignment?

:::: {.columns}

::: {.column width="50%"}

Confounder $\boldsymbol{Z}$ causes trt $\boldsymbol{X}$ and outcome $\boldsymbol{Y}$

- This makes $\boldsymbol{X}$ correlated with $\boldsymbol{Y}$, but correlation $\neq$ causation

PS models the relation between $\boldsymbol{X}$ and $\boldsymbol{Z}$, thus removes confounding

:::

::: {.column width="50%"}
![](./figures/confounder.png)
<font size='2'> https://sixsigmadsi.com/glossary/confounding/</font>
:::

::::

> <font size='6'>
Without a model for how treatments are assigned to units, formal causal inference is impossible [@little2000-potential-outcomes] </font>

## 4 methods of using PS

![](./figures/PS-4methods-table.png)


::: aside
[@deb2016-PS-review]
:::

# Matching

## 

> The PS is a **balancing score**: patients with similar PS should have similar baseline covariates [@austin2011-intro-PS]

:::: {.columns}

::: {.column width="50%"}
![](./figures/PS-matching.png)

<font size='3'> [@mcdonald2013-matching-figure]</font>
:::

::: {.column width="50%"}
- PS is a composite summary of all baseline covariates
- Match Trt-Control patients with similar PS values
:::

::::


## Matching: many choices

**[Recommendations]{style='color: steelblue;'}** from simulations of [@austin2014-compare-matching-methods]:

- Optimal vs **[greedy nearest neighbor matching]{style='color: steelblue;'}**?

:::: {.columns}

::: {.column width="60%"}
![](./figures/greedy-vs-optimal-matching.png)

::: aside
[@chen2022-greedy-vs-optimal-match-figure]
:::
:::

::: {.column width="40%"}
- **Greedy**: iteratively match to nearest neighbor
- **Optimal**: minimize the total distance between all pairs

:::

::::

## **[Caliper]{style='color: steelblue;'}** or no caliper?

![](./figures/caliper.png)

<font size='2'> https://help.easymedstat.com/support/solutions/articles/77000538175-caliper-in-propensity-score-matching</font>

## 

- How wide should caliper be? **[0.2*SD of logit PS]{style='color: steelblue;'}** [@austin2011-optimal-caliper-width]

<br>

- Match with or **[without replacement]{style='color: steelblue;'}**?

<br>

- For greedy matching, what order should you select the treated subjects?
    - Lowest to highest PS, highest to lowest, best match first, or **[random order]{style='color: steelblue;'}**

## Types of treatment effects

**ATE**: $E\big[Y_i(T) - Y_i(C)\big]$

* Average effect for ALL patients in the population

**ATT**: $E\big[Y_i(T) - Y_i(C)\big | T]$

* Among patients who were Treated, how would their outcomes have changed if they were a Control?

**ATC**: $E\big[Y_i(T) - Y_i(C)\big | C]$

* Among patients who were Controls, how would their outcomes have changed if they were Treated?

| Method                        | ATE      | ATT/ATC  |
|-------------------------------|----------|----------|
| Matching                      | `r '\u274C'`         | `r '\u2705'` |
| Stratification                | `r '\u2705'` | `r '\u2705'` |
| Inverse probability weighting | `r '\u2705'` | `r '\u2705'` |
| Covariate adjustment          | `r '\u274C'`  | `r '\u274C'`|

<font size='3'> [@deb2016-PS-review]</font>

# Example

## Employment training and income

- **Treatment** (N = 185): National Supported Work Demonstration (NSW) employment training program
- **Control** (N = 429): from the Population Survey of Income Dynamics (PSID)
- **Goal**: Does training program increase mean income (1978)?
- **Baseline covariates**: age, education, race, marital status, pre-treatment income (1974/1975)

::: aside
[@dehejia1999]
:::

```{r prepare lalonde data,}
#| include: false
data(lalonde, package = 'MatchIt');
dataset <- lalonde;
psvars <- c('age', 'educ', 'race', 'married', 'nodegree', 're74', 're75');
outcome <- 're78';
exposure <- 'treat';
psdata <- dataset[,c(exposure, psvars)];
stopifnot(all(c(psvars,outcome, exposure) %in% colnames(dataset)));

# check initial balance
match.null <- matchit(
    treat~.,
    data = psdata,
    method = NULL
    )
#summary(match.null);
```

## Nearest neighbor caliper matching

```{r match, echo = TRUE}
#| code-line-numbers: "1,3,7,8,10,11"
library(MatchIt);

set.seed(1234); # <1>
match.nnc.logit <- matchit(
    treat ~.,
    data = psdata,
    method = 'nearest',
    distance = 'glm',
    replace = FALSE,
    caliper = 0.2,
    std.caliper = TRUE,
    ratio = 1 # 1:1 matching
    );
```

::: {style="font-size: 80%;"}
- MatchIt R package [@matchit]
- Default is logistic reg., but `?distance` gives many other options 
(e.g. LASSO, random forests, boosting, NNets)
- [@austin2011-optimal-caliper-width] recommends caliper = 0.2*SD logit PS

:::

## PS distribution before/after matching

:::{.column-body-outset}

```{r PS distribution before and after matching}
#| echo: false
bal.plot(
    x = match.nnc.logit,
    var.name = 'distance',
    which = 'both',
    sample.names = c('Unmatched', 'Matched')
    );
```
:::

- Notice poor overlap before matching `r '\u2794'` won't match all Trt patients, unless $N_{control}$ is large

```{r}
#| echo: false
tab <- matchit.pct.matched(match.nnc.logit)$tab.counts.pct;
knitr::kable(tab);
```

- `r '\u2B06'` unmatched Trt patients = `r '\u2B06'` biased $\widehat{\text{ATT}}$, instead estimates "Average treatment effect in the Overlap population (ATO)" [@varga2023-confounding-observational]


## Covariate balance

:::{.scrolling}

```{r check covariate balance, echo = FALSE}

plot(
    match.nnc.logit,
    type = 'density',
    interactive = FALSE
    );
```

:::

## Standardized differences

:::: {.columns}

::: {.column width="65%"}

```{r check standardized differences before and after match, fig.height = 8, echo = FALSE}
plot(summary(match.nnc.logit, un = TRUE), threshold = c(0.1, 0.25));
```

:::

::: {.column width="35%"}

![](./figures/standardized-diff.png)

::: {style="font-size: 70%;"}
Cutoffs: 

- $\le$ 0.1 [@austin2011-intro-PS]
- $\le$ 0.25 [@harder2010]
- No p-values [@austin2011-intro-PS]
:::
:::

::::

## Estimated treatment effect

```{r estimating the treatment effect, echo = FALSE}
mdata <- match.data(match.nnc.logit);
colnames(mdata)[colnames(mdata) == 'subclass'] <- 'match.id';
mdata$patient.id <- rownames(mdata);
odata <- dataset[, outcome, drop = FALSE];
odata$patient.id <- rownames(odata);

mdata <- merge(
    x = mdata,
    y = odata,
    by = 'patient.id',
    all.x = TRUE
    );
```


```{r, echo = FALSE}
create.densityplot(
    x = data.frame(mdata$re78[mdata$treat==1], mdata$re78[mdata$treat==0]),
    xlab.label = 'Income in 1978 (USD)',
    xlimits = c(0, 30000),
    xat = seq(0, 30000, by = 5000),
    col = default.colours(2),
    # Legend
    legend = list(
        inside = list(
            fun = draw.key,
            args = list(
                key = list(
                    points = list(
                        col = default.colours(2),
                        pch = 21,
                        cex = 1.5,
                        fill = default.colours(2)
                        ),
                    text = list(
                        lab = c('Treatment', 'Control')
                        ),
                    padding.text = c(0,5,0),
                    cex = 1
                    )
                ),
            x = 0.65,
            y = 0.97,
            draw = FALSE
            )
        ),
    );
```

<font size='5.5'>
```{r, echo = FALSE}
#| tbl-colwidths: [1,1,1,1,1,1]
mean.diff <- t.test(re78 ~ factor(treat, levels = c(1, 0)), data = mdata);
mean.diff <- broom::tidy(mean.diff);
colnames(mean.diff)[colnames(mean.diff) == 'estimate'] <- 'diff'
colnames(mean.diff)[colnames(mean.diff) == 'estimate1'] <- 'trt.mean'
colnames(mean.diff)[colnames(mean.diff) == 'estimate2'] <- 'ctrl.mean'
mean.diff <- mean.diff[c('trt.mean', 'ctrl.mean', 'diff', 'conf.low', 'conf.high', 'p.value')];

knitr::kable(
    mean.diff,
    digits = c(rep(0, 5), 3),
    table.attr = "style='width:50%;'"
    );
```
</font>

- <font size='5'>
Interestingly, PS estimate of trt effect ($`r round(mean.diff$diff)`) is close to RCT estimate ($1641) [@lalonde1986]</font>

## Optimal matching

:::{.column-body-outset} 

```{r optimal matching, echo = FALSE}
set.seed(seed);
match.opt <- matchit(
    treat~.,
    data = psdata,
    method = 'optimal',
    distance = 'glm',
    # estimand: set to ATT if N_treat >> N_control, 
    #   ATC if N_control >> N_treat
    estimand = 'ATT',
    replace = FALSE,
    ratio = 1 # 1:1 matching
    );
plot(summary(match.opt, un = TRUE), threshold = c(0.1, 0.25));
```
::: 

- Matches 100% of Trt patients (unlike greedy caliper match)
- Attempts to *optimize* the total distance between all pairs
- However, resulted in much worse covariate balance

## Summary

- **Goal:** estimate causal effect of exposure $X$ on outcome $Y$, using observational data
- PS matching balances measured confounders between exposure groups (similar to RCT)
   + <font size='6.5'> Intuitive: show covariates are balanced after matching, like RCT </font>
   + <font size='6.5'> Of course, unmeasured confounders may remain </font>
- If estimating ATT (ATC), then need to match all Trt (Control) patients.  If high % unmatched, consider other methods.

## Alternatives

- **Stratification on PS** uses all patients, but does not work well with survival data [@austin2014-PS-survival; @austin2016-PS-survival]

- **Inverse probability weighting** and **regression adjustment** are very flexible, but generally not accepted by FDA (unlike matching) [@lu2019]


- **Causal Random Forests** look promising:
[https://grf-labs.github.io/grf/](https://grf-labs.github.io/grf/)

# References