suppl_report.Rmd

---
title: "Supplemental Report for JRSSA-Apr-2024-0120"
author: "Sho Kawano | January 25th, 2025"
output: 
  pdf_document:
    toc: true # table of content true
    toc_depth: 1  # upto three depths of headings (specified by #, ## and ###)
    number_sections: true  ## if you want number sections at each table header
---

```{r setup, include=F}
knitr::opts_chunk$set(echo = F, warning = F, message=F)
library(usmap)
library(tidyverse)
library(tidycensus)
library(patchwork)
```


```{r include=F}
load("data/data_NC/data.RDA")
load("sim_results_NC/results_by_sim.RDA")

pop <- readRDS("data/nc_data_w_geom.RDS")

tmp.df = errs_all_sims %>% 
  group_by(model, area) %>%
  reframe(mse=mean(err^2))

ref_table = data.frame(area=1:nrow(all_data), GEOID=all_data$fips)

tmp.df = tmp.df %>% pivot_wider(names_from = model, values_from = mse) %>%
  pivot_longer(cols=!c(D.Est, area), names_to = "model", values_to = "mse")

mse_pltDf = pop %>% left_join(ref_table, by = join_by(GEOID)) %>% 
  left_join(tmp.df, by = join_by(area)) %>% 
  filter(model!="CAR") %>% 
  mutate(model=ifelse(model=="New", "SSD", model))
```


\newpage 

# Further Analysis on Simulation Study Results 

## Comparing MSE 

```{r fig.width=7, fig.height=5, fig.cap="Area-Specific MSE from Direct Estimates vs. Models"}
mse_pltDf %>% 
  ggplot(aes(x=log(D.Est), y=log(mse) )) +  
  facet_wrap(~model, nrow=2) +
  geom_abline(slope=1) + 
  geom_tile(aes(x=-6, y=-5, width=2, height=2), fill=NA, color="blue")+
  geom_point(alpha=0.5) +
  coord_fixed() +
  xlab("log(MSE) for D.Est") + 
  ylab("log(MSE) for Models") + 
  ggtitle("") +
  theme(axis.text.x = element_text(angle = 45))
```

In figure 1, we compare the Area-Specific MSE which we define as: 

$$\mbox{MSE for area } i = \frac{1}{G} \sum_{g=1}^G  (\hat{z}^{(g)}_i-z_i)^2$$
where $G=300$ is the number of simulations in the study. Note that this *differs* from the Average MSE we compared in the main manuscript that averages MSE across all areas and simulations. If a model outperformed the direct estimate for a given area, a point would lie *below* the line. We can see that the SSD model achieves a MSE reductions across the board. 

There are a few areas where using a model increases MSE (enclosed in rectangle) relative to the direct estimate. The increase in MSE is *lower* for SSD model than all other models. This is a major factor in why the SSD model outperforms others. 

\newpage 
## Comparing Coverage

```{r}
# Calculating coverage by area 
covRate_df <- ci_all_sims %>% group_by(area, model) %>%
   reframe(cov_rate=mean(covered)) 

cr_pltDf = pop %>% 
  left_join(ref_table, by = join_by(GEOID)) %>% 
  left_join(covRate_df, by = join_by(area)) %>% 
  filter(model!="CAR") %>% 
  mutate(model=ifelse(model=="New", "SSD", model))
```

```{r fig.width=8, fig.height=3.5, fig.cap="Undercoverage of a 90% credible interval by model"}
cr_pltDf %>% 
  ggplot(aes(fill = 0.9-cov_rate)) +
  geom_sf() +
  theme_void() + facet_wrap(~model, nrow=2) +
  scale_fill_gradient(low = "light blue", high = "red")+
  #scale_fill_viridis_c(option="E") + 
    labs(title = "",
       fill = "Undercoverage= \n 0.9-Cov.Rate", 
       caption="Red indicates high undercoverage") 
```

In figure 2, we plot the area-specific undercoverage of a 90% credible interval. We define the area-specific coverage rate as: 
$$\mbox{Coverage Rate for Area } i = \frac{1}{G} \sum_{g=1}^G  \boldsymbol{I} \{\hat{l}_i^{(g)} < z_i\} \cdot \boldsymbol{I} \{z_i< \hat{u}_i^{(g)}\}.$$
We define undercoverage as $0.9-\textrm{Coverage Rate}$. If a model, on average achives 90% coverage in a given area, the value should be zero. The counties in **red** in the map indicates **high undercoverage** which means that the **interval estimate is poor** for a given county and model. 

The counties where the other models struggled in terms of coverage is the same counties are the same counties where they struggle in terms of accuracy. The models without random effects selection (BYM and FH models) struggle mightily in these problematic counties. The DM model struggles a bit less than the other models in these counties but has higher undercoverage in other parts of the state. This can also be seen in Figure 3, where we show the log design SEs compared to the coverage rate of a 90% credible interval from each model. The dotted line is 90% (the target coverage rate), and the red area indicates coverage below 70%. We can see that the coverage rate for many models start to drop as the SE gets larger. There is a clear difference when you compare the performance of the SSD to the other models. 


```{r fig.width=6, fig.height=3, fig.cap="Log Design SEs vs. Coverage by model"}
tmp.df = data.frame(area=1:nrow(all_data),
                    SE = all_data$rentBurdenSE) 

covRate_df %>% left_join(tmp.df) %>% 
  filter(model!="CAR") %>% 
  mutate(model=ifelse(model=="New", "SSD", model)) %>% 
  ggplot(aes(x=log(SE), y=cov_rate)) + 
  geom_point(alpha=0.7) + 
  facet_wrap(~model, nrow=2) +
  geom_hline(yintercept = 0.9, lty=2, color="blue") +
  theme_bw() +
  geom_area(aes(y=0.7), fill="red", alpha=0.1) +
      labs(title = "") +
  ylab("Cov. Rate - 90% Cr.Interval")
```


\newpage 

# Further Analysis on Random Effects from Data Analysis

```{r include=FALSE}
rm(list = ls()) # clear all objects
options(scipen=20)

# ---- Packages ---- 
library(bayesplot)
library(tidycensus)
library(spdep)
library(patchwork)
library(tidyverse)

# ---- Data ---- 
load("data/data_multi-state/data.RDA")

all_data <- all_data %>% 
  mutate(high_rentBurden = ifelse(rentBurden>0.4, T, F)) 

# the response 
covs = c("degree", "assistance", "no_car", "povPerc",  
         "white", "black", "native", "asian", "hispanic")
X <- model.matrix(~., all_data[, covs, drop=F]) 

y <- all_data$rentBurden
D <- all_data$rentBurdenSE^2
y.star <- log(y)
D.star <- D / y^2

center <- mean(y.star)
scale <- sd(y.star)
y.scaled <- (y.star-center)/scale
D.scaled <- D.star/scale^2

# ---- Load Chains ---- 
load("data_analysis_chains/bym_res.RDA")
load("data_analysis_chains/dm_res.RDA")
load("data_analysis_chains/fh_res.RDA")
load("data_analysis_chains/new_res.RDA")

# ---- Load Map -----
pop <- readRDS("data/sa_data_w_geom.RDS")
```


## What kind of random effects are selected? 

```{r fig.height=4, fig.width=8, fig.cap="Posterior Means of Selection Probabilities vs. Random Effects for DM and SSD Model"}

# create the dataframe with the necessary data 
dm_reffects = (dm_res$v*dm_res$delta) %>% colMeans()
tmp.df = data.frame(area=1:nrow(all_data), 
                    sel_prob = dm_res$p.delta %>% colMeans(),
                    reffects=dm_reffects,
                    theta=exp(new_res$theta*scale+center) %>% colMeans() ) 
# plot 
plt1 = tmp.df %>% ggplot(aes(x=sel_prob, y=reffects)) + 
  geom_point(color="blue", alpha=0.5, shape=1) +
  ylab("Random Effects") +
  xlab("Selection Probabilities")+
  ggtitle("DM Model")

# create the dataframe with the necessary data 
new_reffects = scale*(new_res$delta*(new_res$v1+new_res$v2)) %>% colMeans()

tmp.df = data.frame(area=1:nrow(all_data), 
                    sel_prob = new_res$p.delta %>% colMeans(),
                    reffects=new_reffects,
                    theta=exp(new_res$theta*scale+center) %>% colMeans() ) 

plt2 = tmp.df %>% ggplot(aes(x=sel_prob, y=reffects)) + 
  geom_point(color="blue", alpha=0.5, shape=1) +
  ylab("Random Effects") +
  xlab("Selection Probabilities") +
  ggtitle("SSD Model")
plt1+ plt2
```

We can see from Figure 4 that the areas where the selection probability is high (close to 1) also have larger magnitude random effects. The relationship is more clear-cut for the DM model. There is some ``noise" from the SSD model resulting from the dependence of the random effects and the $p_i$ parameters, both of which impacts the selection probability. See discussion in section 3.2 of the main manuscript for details. 


## Comparing Random Effects from Different Models 

We can see from the map below that the SSD incorporates elements from the other models. Both models with selection (DM & SSD) has larger magnitude effects than the other two models, due to these models not having a common variance assumption. Also, we can see that all of the models have fairly similar spatial patterns. But the spatial dependence is obviously more explicit in the spatial models (BYM and SSD). 

```{r}
fh_reffects = fh_res$v %>% colMeans()
dm_reffects = (dm_res$v*dm_res$delta) %>% colMeans()
bym_reffects = (bym_res$v1+bym_res$v2) %>% colMeans()
new_reffects = scale*(new_res$delta*(new_res$v1+new_res$v2)) %>% colMeans()

tmp.df = data.frame(area=1:nrow(all_data),
                    FH=fh_reffects,
                    DM=dm_reffects,
                    BYM=bym_reffects,
                    New=new_reffects)

ref_table = data.frame(area=1:nrow(all_data), GEOID=all_data$fips)

plt_df = pop %>%
  left_join(ref_table, by = join_by(GEOID)) %>%
  left_join(tmp.df, by = join_by(area)) %>%
    arrange(area)
plt_df = plt_df %>%
  pivot_longer(cols=FH:New, names_to="model", values_to = "reffects")
```

```{r big_reffect_map, fig.dim=c(7, 6.5), fig.cap="Random Effect Posterior Means from 4 Different Models", cache=TRUE}
plt_df %>%
  # filter(model %in% c("DM", "New")) %>% 
  mutate(model=ifelse(model=="New", "SSD", model)) %>% 
  mutate(model = factor(model, levels=c("FH", "DM", "BYM", "SSD"))) %>%
  ggplot(aes(fill = reffects )) +
  geom_sf() +
  # scale_fill_viridis_b(breaks=c(-0.15, -0.01, -0.005, 0.005, 0.01, 0.15)) +
  # scale_fill_distiller(palette = "BuPu") +
    scale_fill_distiller(palette = "PuOr", direction = 1) +
  labs(#title = "Random Effects from Each Model",
       #caption="Comparing Posterior Means",
       fill = "Random Effects")  +
  theme_void() + facet_wrap(~model, nrow=2)
```

\newpage 
# Prior Sensitivity Analysis - Random Effect Variances

```{r include=FALSE}
rm(list = ls())
#NOTE: This section takes a while to run (~5 minutes)

knitr::opts_chunk$set(echo = FALSE)
library(LaplacesDemon)
library(BayesLogit) 
library(Matrix)
library(latex2exp)
library(tidyverse)
```


```{r warning=FALSE}
#south atlantic census division
load("data/data_NC/data.RDA")

#data 
covs = c("degree", "assistance", "no_car", "povPerc",  
         "white", "black", "native", "asian", "hispanic")
X <- model.matrix(~., all_data[, covs, drop=F]) 
n <- nrow(X); j <- ncol(X) # number of covariates INCLUDES intercept 

#response (with transformation)
y <- all_data$rentBurden
d.var <- all_data$rentBurdenSE^2
y.star <- log(y)
d.star <- d.var / y^2
d.scl <- d.star / sd(y.star)^2
```


For the proposed Spatially Selected and Dependent (SSD) model, the data model is $[\boldsymbol{y} | \boldsymbol{\theta} ] \sim N_n(\boldsymbol{\theta}, \boldsymbol{D})$ where $\boldsymbol{y}$ are the direct estimates from the survey and the covariance matrix  $\boldsymbol{D}$ is a diagonal matrix of the design variances. The small area means are modeled with $\boldsymbol{\theta} = \boldsymbol{X} \boldsymbol{\beta} \,  + \boldsymbol{u}$. 

Each of the random effects are modeled as $u_i = \delta_i \cdot (v_{1i} +v_{2i})$ where $[\boldsymbol{v}_1 | \sigma_{1}^2]  \sim N_n(\boldsymbol{0}, \sigma_{1}^2 \, \boldsymbol{I})$ and $[\boldsymbol{v}_2 | \sigma_{2}^2]  \sim N_n(\boldsymbol{0}, \sigma_{2}^2 \, \mathbf{Q}^-)$ where $\mathbf{Q}^-$ is the scaled ICAR precision matrix. The selection indicators are modeled as $[\delta_i|p_i]  \overset{ind}\sim Bernoulli(p_i)$ where $p_i$ are the selection probabilities. We model these selection probabilities via $logit(\boldsymbol{p}) = \boldsymbol{\psi}_1 + \boldsymbol{\psi}_2$ through two logit effects $[\boldsymbol{\psi}_1|s_1^2]  \sim N_n(\boldsymbol{0}, s_{1}^2 \, \boldsymbol{I})$ and $[\boldsymbol{\psi}_2|s_2^2]  \sim N_n(\boldsymbol{0}, s_{2}^2 \, \boldsymbol{Q}^-)$. For the full details of our model, please see the main manuscript. 

The parameters that require priors are the regression coefficients $\boldsymbol{\beta}$ and the two sets of variance parameters $\sigma_1^2, \sigma_2^2$, and  $s_1^2, s_2^2$. The prior for the regression coefficients are set non-informatively $\boldsymbol{\beta} \sim N_j(\boldsymbol{0}, \ 100^2 \boldsymbol{I})$ and therefore has little impact on the estimates. The prior choice of $s_1^2, s_2^2$ can affect convergence of the MCMC chains but they have minimal impact on the estimates themselves.

We have observed that the choice of prior on the random effect variances $\sigma_1^2$, $\sigma_2^2$ have the biggest impact on the estimates, as they influence the inclusion probabilities $p_i$. Thus, our prior **sensitivity analysis focuses entirely on these random effect variances**. We first start by discussing a key feature of the inverse-gamma distribution: the gap near the origin. 


## The Gap Near the Origin for Different Inverse-Gamma Priors

Let $X$ be an inverse-gamma prior with the density: 
$$f(x | c, d) = \frac{d^c}{\Gamma(c)} (1/x)^{c+1} \exp(- d / x).$$ We denote this as $X \sim IG(c, d).$ 
For every inverse-gamma distribution, there is threshold value $t$ at which $P(X < t)= \alpha$
for some small probability $\alpha$. For example for $c=3$, $d=3$, and $\alpha=0.001$ the threshold $t$ is $\approx 0.145$. In the figure below, we can see the gap increase as we alter the shape $c$ and scale $d$ parameters. 

```{r fig.height=2, fig.width=6, fig.cap="Gaps Near Zero for Different Inverse-Gamma Priors"}
par(mfrow=c(1, 3), mar=c(4, 3, 2, 3))

c=1; d=1
curve(dinvgamma(x, c, d), from=0, to=0.8, n=1000, 
      ylab="density", main=paste0("IG(c=", c, " d=", d, ")"), 
      xlab=TeX("$\\sigma^2$"), col="blue", cex.main=1)
abline(h=0, lty=2)

c=5; d=5
curve(dinvgamma(x, c, d), from=0, to=0.8, n=1000,
      ylab="density", main=paste0("IG(c=", c, " d=", d, ")"),
      xlab=TeX("$\\sigma^2$"), col="blue", cex.main=1)
abline(h=0, lty=2)

c=5; d=10
curve(dinvgamma(x, c, d), from=0, to=0.8, n=1000,
      ylab="density", main=paste0("IG(c=", c, " d=", d, ")"),
      xlab=TeX("$\\sigma^2$"), col="blue", cex.main=1)
abline(h=0, lty=2)
```

When considering inverse-gamma priors for the random effects variance parameter models like the Datta-Mandal or the SSD model, this gap is important for to make sure that the "spike" and "slab" are distinguishable. 

For the SSD model, we would like a prior that is non-informative but leaves enough space near zero. Setting the inverse-gamma hyperparameters $IG(c, d)$ smaller results in a non-informative prior. However, doing so results in a very small threshold and not enough gap near the origin. So what threshold is optimal to balance these two factors? In our experience, we found what makes a suitable prior is dependent on the scale of the data. This is why we recommend scaling $(y_i - \bar{y}) / s_{\boldsymbol{y}}$ where $\bar{y}, s_{\boldsymbol{y}}$ is the sample mean and standard deviation of the direct estimates to aid prior specification. 


```{r }
find_t = function(k){
  c=hyp_choices$shp[k]
  d=hyp_choices$scl[k]
  # since P(X<t)=P(1/X > 1/u) where 1/X is gamma(a, b)
  threshCDF <- function(t, alpha=0.001){
    abs((1-pgamma(1/t, c, rate=d))-alpha)
  }
  
  optimize(threshCDF, interval = c(0, 1))$minimum
}

# load prior choices
C = c(0.5, 1, 3, 5, 7)
hyp_choices = data.frame(shp=rep(C, 2), scl=c(C, 2*C)) %>% arrange(shp)
num_priors = nrow(hyp_choices)


hyp_choices = hyp_choices %>% mutate(t = sapply(1:num_priors, find_t)) %>% arrange(t)
```

## Assessing Impact on Random Effects Shrinkage

```{r  mcmc_chains_nc, warning=F, message=FALSE, cache=TRUE}
# load sampler 
source("samplers/ssd_fit.R")

set.seed(7)
# function to iterate over 
fit_chain <- function(k){
  # note that the function automatically handles the scaling 
  chain = ssd_fit(X, y.star, d.star, A, ndesired=2000, nburn=1000, nthin=1, 
                  hyp=list(c1=hyp_choices$shp[k], d1=hyp_choices$scl[k], 
                           c2=hyp_choices$shp[k], d2=hyp_choices$scl[k]))
  return(chain)
}

# fit chains in parallel
library(parallel); library(doParallel)
options(mc.cores = 4)
cl <- parallel::makeForkCluster(4, setup_strategy = "sequential")
doParallel::registerDoParallel(cl)

list_of_chains = parLapply(cl, 1:num_priors, fit_chain)
stopCluster(cl)
# done! 

# calculate average selection prob. 
avg_sel_prob = sapply(1:num_priors, function(i){ list_of_chains[[i]]$p.delta %>% colMeans() %>% mean()})
```

The *posterior selection probabilities* $\tilde p_{i}$ allows us to assess shrinkage in the random effects (see eq. 2 in section 3.2). To get a sense of the overall level of shrinkage for a given dataset, we examine the average posterior selection probability across all areas (i.e. $\tfrac{1}{n} \sum_{i=1}^n \tilde p_{i}$).

Below is a plot of the different $t$ values and average *posterior* average selection probabilities resulting from different inverse-gamma priors for the rent burden dataset in North Carolina. We can see that there is larger values of $t$ lead to more random effects being reduced to zero (lower selection probability). Note that the drop in average selection probability is quite large for  $t< 0.3$ but there is an inflection point near $0.3$. This is a pattern we have observed in other datasets as well. Thus *we recommend any Inverse-Gamma prior with $t>0.3$ with $\alpha=0.001$*. 


```{r message=FALSE,  fig.height=3, fig.width=5, fig.cap="Impact on Random Effects Shrinkage of Varying Priors" }
hyp_choices %>% mutate(avg_selection_prob = avg_sel_prob ) %>% 
  ggplot(aes(x=t, y=avg_selection_prob)) + geom_point() + 
  geom_line(linewidth=0.5, lty=2)  + 
  xlab("t (for different priors)") + 
  ylab("Avg. Posterior Selection Probability")+
  ggtitle("Variability of Shrinkage for IG Priors - NC Data")+
  geom_vline(xintercept = 0.3, lty=3, color="blue")
```


## Assessing Impact on Estimates (North Carolina)

Here we show the variability of the point estimates of rent burden for Inverse-Gamma priors with $t>0.3$ for the North Carolina dataset. We can see that the prior choice has minimal impact outside of a few areas. There are two areas with some variability (areas 72, 94).  These correspond to estimates for Perquimans and Washington counties that have the highest design variance. Thus the prior choice has a significant impact on the estimates for these counties. Note that this is not a problem for all datasets, as seen in the next example.

```{r fig.height=8, fig.width=6, fig.cap="Prior Impact on Point Estimates for IG Priors (t>0.3) - North Carolina Rent Burden Data"}
estim = sapply(5:num_priors, 
               function(i){ 
  theta.orig.scl = exp(list_of_chains[[i]]$theta*list_of_chains[[i]]$scale + list_of_chains[[i]]$center)
  # point estimates
  theta.orig.scl %>%  colMeans()})

estim %>% t() %>% as.data.frame() %>% 
  pivot_longer(cols = starts_with("theta"), names_to = "area") %>%
  mutate(area = str_remove(area, "theta_") %>% factor(levels=as.character(1:100))) %>% 
  ggplot(aes(x=area, y=value)) + geom_boxplot() + 
  ylab("Poverty Rate Estimate") + xlab("County")+
  coord_flip() + 
  geom_vline(xintercept = c(94, 72), lty=2, linewidth=0.1) +
  theme(axis.text=element_text(size=7))
```

We can see that the prior choice has minimal impact outside of a few areas. There are two areas with some variability (areas 72, 94).  These correspond to estimates for Perquimans and Washington counties that have the highest design variance. Thus the prior choice has a significant impact on the estimates for these counties. Note that this is not a problem for all datasets, as seen in the next example.

## Assessing Impact on Estimates (Illinois)

```{r  mcmc_chains_il, warning=FALSE, cache=TRUE}
#south atlantic census division
load("data/data_IL/data.RDA")

#data 
covs = c("degree", "assistance")
X <- model.matrix(~., all_data[, covs, drop=F]) 
n <- nrow(X); j <- ncol(X) # number of covariates INCLUDES intercept 

#response (with transformation)
y <- all_data$povPerc
d.var <- all_data$povPercSE^2
y.star <- log(y)
d.star <- d.var / y^2

# load sampler 
setwd("~/coding/spike_slab_fh/ssd_paper_code")
source("samplers/ssd_fit.R")

# function to iterate over 
fit_chain <- function(k){
  # note that the function automatically handles the scaling 
  chain = ssd_fit(X, y.star, d.star, A, ndesired=2000, nburn=1000, nthin=1, 
                  hyp=list(c1=hyp_choices$shp[k], d1=hyp_choices$scl[k], 
                           c2=hyp_choices$shp[k], d2=hyp_choices$scl[k]))
  return(chain)
}

set.seed(7)
# fit chains in parallel
library(parallel); library(doParallel)
options(mc.cores = 4)
cl <- parallel::makeForkCluster(4, setup_strategy = "sequential")
doParallel::registerDoParallel(cl)

list_of_chains = parLapply(cl, 1:num_priors, fit_chain)
stopCluster(cl)
# done! 

# calculate average selection prob. 
avg_sel_prob = sapply(1:num_priors, function(i){ list_of_chains[[i]]$p.delta %>% colMeans() %>% mean()})
```

We repeat this exercise again with Illinois $n=102$ and estimating poverty rates. Here we show the variability of the point estimates for Inverse-Gamma priors with $t>0.3$. 

```{r fig.height=9, fig.width=6, fig.cap="Prior Impact on Point Estimates for IG Priors (t>0.3) - Illinois Poverty Data"}
estim = sapply(5:num_priors, 
               function(i){ 
  theta.orig.scl = exp(list_of_chains[[i]]$theta*list_of_chains[[i]]$scale + list_of_chains[[i]]$center)
  # point estimates
  theta.orig.scl %>%  colMeans()})
estim %>% t() %>% as.data.frame() %>% 
  pivot_longer(cols = starts_with("theta"), names_to = "area") %>%
  mutate(area = str_remove(area, "theta_") %>% factor(levels=as.character(1:102))) %>% 
  ggplot(aes(x=area, y=value)) + geom_boxplot() + 
  ylab("Poverty Rate Estimate") + xlab("County")+
  coord_flip() + theme(axis.text=element_text(size=7))
```


## Conclusion & Recommendation 

We conducted a prior sensitivity analysis for the random effect variance parameters for the SSD model. The prior choice of these parameters can influence the level of shrinkage and the value of estimates, especially in areas with high design variance. 

We tested the model on two different datasets. We saw that choosing an inverse-gamma prior with $t$ value that is too small may force the random effects for most areas to be selected. On the other hand, having $t$ that is very large can force higher shrinkage and lower selection probabilities. We recommend choosing a inverse-gamma prior with $t>0.3$ for $\alpha=0.001$ that balances both of these factors. For the work done for our main manuscript, we chose the prior $IG(5, 5)$ where $t\approx 0.34$. 

# Full Conditionals Derivation