sewage through May.Rmd

---
title: "Coronavirus in New Haven Sewage"
author: "Additional analyses for Peccia et al."
date: "6/1/2020"
output:
  html_document:
    toc: true
    toc_depth: 3
    toc_float: true
  pdf_document: default
params:
  rerun_model: FALSE
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(readxl)
library(reshape2)
library(HDInterval)
library(rjags)
library(xtable)
library(sas7bdat)
library(pbapply)

source('./Models/mod1.deriv.ar1.R')
source('./Models/mod2.dist.lag.alt2.R')
source('./Models/mod2.indiv.lags.R')


```

## Overview
These are additional analyses of data examining increases in SARS-CoV-2 RNA concentrations in primary sewage sludge in relation to COVID-19 hospitalizations and cases. The original preprint, led by Jordan Peccia, et al can be found here: https://www.medrxiv.org/content/10.1101/2020.05.19.20105999v1 
The original pre-print included a correlation of smoothed versions of the viral RNA data and the epidemiological indicators. In the analyses presented here, we use an error-in-variables time series model to get an estimate of the underlying viral dynamics in sewage and the relationship of this with hospitalizations. This analysis is carried out in the Bayesian framework, allowing us to correctly quantify uncertainty in the estimated associations. These additional analyses were performed by Dan Weinberger (Epidemiology of Microbial Diseases, Yale School of Public Health), with input from Josh Warren (Biostatistics, Yale School of Public Health) and the rest of the original study team.


## Explanation of the data and analyses

We have up to 4 measurements of viral RNA in sewage sludge daily (2 targets, tested in duplicate). We also have the number of COVID-19 hospitalization per day in the catchment area. 

We have five ways of measuring cases detected by laboratory test.

1a,b) Number of positive tests based on date when sample was collected; with or with adjustment for total number of tests performed

2a,b) Number of positive tests based on date when result was reported to DPH; with or with adjustment for total number of tests performed

3) Number of cases reported on the DPH website.

The number of cases reported on the DPH website is used for the main analyses here. This is because this is the relevant comparison when the goal is to identify a lead indicator to gain an understanding of the epidemic.cThe relationship among these different measure changes over the course of the epidemic as testing delays shrank. T

In the analyses, we assume that the observed sewage testing data (W) are drawn from an underlying, unobserved trajectory of viral concentration in the sewage, which follows an AR(1) process (X). We then evaluate the association between X and the number of hospitalization/cases/positive tests in Poisson regression models. This count model includes an AR(1) random effect for time. The models are fit using JAGS (see Models/mod2.indiv.lags.R). We test lags of the X variable of 0-7 days in a distributed lags framework. We also present results from each lag tested individually for comparison.


```{r, eval=F, include=F}
## Combine data (only need to do this once)

# ds1 <- read_excel('./Data/raw RNA data.xlsx')
# ds1.m <- melt(ds1, id.vars='DATE')
# names(ds1.m) <- c('date', 'test','conc')

ds1.m <- read_excel('./Data/data through June 1.xlsx')
names(ds1.m) <- c('date', 'conc', 'new.cases', 'test')
ds1.m$date <- as.Date(ds1.m$date)

ds2 <- read.sas7bdat('./Data/dw1stpostest2020_6_15.sas7bdat')
ds2$date <-format(as.Date(ds2$newspecdate, origin="1960-01-01"),"%Y-%m-%d")

ds2.nh <- ds2[ds2$OfficialCity %in% c('NEW HAVEN', 'EAST HAVEN','HAMDEN', 'WOODBRIDGE'),]
ds2.nh$pos.tests <-1
ds2.nh.agg <- aggregate(ds2.nh[,'pos.tests', drop=F],by=list('date'=ds2.nh$date), FUN=sum)
ds2.nh.agg$date <- as.Date(ds2.nh.agg$date)

tests.nh <- tests[tests$OfficialCity %in% c('NEW HAVEN', 'EAST HAVEN','HAMDEN', 'WOODBRIDGE'),]
tests.nh$cov.tests <-1
tests.nh.agg <- aggregate(tests.nh[,'cov.tests', drop=F],by=list('date'=tests.nh$date), FUN=sum)
tests.nh.agg$date <- as.Date(tests.nh.agg$date)

#Info on date of report and test from DPH
tests.report <-
  read.sas7bdat('./Data/dw_1stcovtest_wdates.sas7bdat')

tests.report$DateReported <-
  format(as.Date(tests.report$DateReported,                               origin="1960-01-01"),"%Y-%m-%d")

tests.report$newspecdate <-
  format(as.Date(tests.report$newspecdate, origin="1960-01-01"),"%Y-%m-%d")

tests.report$test.time <- 
  as.numeric(as.Date(tests.report$DateReported) -
  as.Date(tests.report$newspecdate))

test.nh <-  
  tests.report[tests.report$OfficialCity %in% c('NEW HAVEN', 'EAST HAVEN','HAMDEN', 'WOODBRIDGE'),]

test.nh$pos.tests <- 0
test.nh$pos.tests[test.nh$TestResult %in% c('DETECTED', 'POSITIVE')] <- 1

test.nh$tests <- 1

test.nh.agg.report <- aggregate(test.nh[,c('tests','pos.tests'), drop=F],by=list('date_reported'=test.nh$DateReported), FUN=sum)

test.nh.agg.report$date_reported <-
  as.Date(test.nh.agg$date_reported)

names(test.nh.agg.report) <- c('date', 'N_tests_reported','N_positive_reported')

test.nh.agg.rec <- aggregate(test.nh[,c('tests','pos.tests'), drop=F],by=list('date_received'=test.nh$newspecdate), FUN=sum)

test.nh.agg.rec$date_received <-
  as.Date(test.nh.agg.rec$date_received)

names(test.nh.agg.rec) <- c('date', 'N_tests_received','N_positive_recieved')

test.nh.combined <-
  merge(test.nh.agg.rec,test.nh.agg.report, by='date', all=T)

test.nh.combined <- test.nh.combined[,-6]

test.nh.combined <- 
  test.nh.combined[test.nh.combined$date <= as.Date('2020-06-30'),]

test.nh.combined[,-1] <- apply(test.nh.combined[,-1],2, function(x){
  x[is.na(x)] <- 0
  return(x)
})

ds.hosp <- read_excel('./Data/hospitalization2020_07_08.xlsx')
ds.hosp <- as.data.frame(ds.hosp)
ds.hosp$date <- as.Date(ds.hosp[,1])
ds.hosp$hospt_admit <- ds.hosp$`# ADMISSIONS`
ds.hosp <- ds.hosp[,c('date','hospt_admit')]

#ds1.m$date <- as.Date(ds1.m$date)


#ds.cases <- read_excel('./Data/updated case data by reporting date.xlsx')
#names(ds.cases) <- c('date','new.cases')
#ds.cases$date <- as.Date(ds.cases$date)

ds3 <- merge(test.nh.combined, ds.hosp, by='date', all=T)
ds3 <- merge(ds3, ds1.m, by='date', all=T)

ds3$prop.pos_report <-
  ds3$N_positive_reported/ds3$N_tests_reported

ds3$prop.pos_rec <-
  ds3$N_positive_recieved/ds3$N_tests_received

ds3$conc <- as.numeric(ds3$conc)
ds3$target <- 'N1'
ds3$target[grep('N2', ds3$test)] <- 'N2'


write.csv(ds3,'./Data/combined.data.csv')
```


```{r}
## read in combined data

ds3 <- read.csv('./Data/combined.data.csv')
ds3$date <- as.Date(ds3$date)
```

```{r, eval=F}
#Compare "new.case' with N_positive_reported
plot(ds3$new.cases, ds3$N_positive_reported)
abline(a=0, b=1)

plot(ds3$date, ds3$N_positive_reported, col='black', type='l')
points(ds3$date, ds3$new.cases, col='red', type='l')
abline(a=0, b=1)
```


## Plots of raw data

This vertical lines are spaced 7 days apart. The increase in sewage is apparent earlier than the increase in the other indicators. This is particularly apparent on the log scale. 
```{r, fig.width=8, fig.height=13}
par(mfrow=c(4,2))
plot(ds3$date, (ds3$conc), main='Viral RNA in Sewage', col=as.factor(ds3$target), xlab='Date', ylab=' Concentration', bty='l')
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))

plot(ds3$date, log(ds3$conc), main='Viral RNA in Sewage', col=as.factor(ds3$target), xlab='Date', ylab='Log Concentration', bty='l')
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))

#Hospitalization
plot(ds3$date, (ds3$hospt_admit), type='l', bty='l', main='Hospitalizations', ylab=('Hospitalizations (N)'))
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))

plot(ds3$date, log(ds3$hospt_admit), type='l', bty='l', main='Log Hospitalizations', ylab=('log(Hospitalizations (N))') )
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))

# Cases based on date of report
# plot(ds3$date, ds3$new.cases, type='l', main='New reported cases')
plot(ds3$date, (ds3$N_positive_reported), main='New cases, based on report date', xlab='Date', ylab='Reported cases', bty='l', type='l')
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))

plot(ds3$date, log((ds3$N_positive_reported+0.5)), main='log(New cases, based on report date)', type='l', xlab='Date', ylab='Log Reported cases', bty='l')
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))


#Pos tests ased on date of sample collection
plot(ds3$date, ds3$N_positive_recieved, main='New cases, based on sample receipt date', xlab='Date', ylab='Cases', bty='l' , type='l')
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))

plot(ds3$date, log(ds3$N_positive_recieved+0.5), main='Log(New cases, based on sample receipt date)', xlab='Date', ylab='Log(Reported Cases)', bty='l', type='l')
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))


```


Comparing the new cases of COVID-19 based on sample receipt date or date of report, we can see a long lag early in the epidemic between sample collection date and date of report to DPH. This gap seems to narrow as the epidemic progresses, perhaps due to improvements in testing
```{r}
plot(ds3$date, (ds3$N_positive_reported), main='Sample receipt date (red) vs report date (black)', xlab='Date', ylab='(Reported cases)', bty='l', type='l')
points(ds3$date, (ds3$N_positive_recieved), col='red', type='l')
```

Comparing the new cases of COVID-19 based on date of report to DPH (black) and when the data were reported on the DPH website (red).
```{r}
plot(ds3$date, (ds3$N_positive_reported), main='Web report date (red) vs report date to DPH (black)', xlab='Date', ylab='(Reported cases)', bty='l', type='l')
points(ds3$date, (ds3$new.cases), col='red', type='l')
```


```{r}
## Prepare data for JAGS

ds.mod <- ds3[!(is.nan(ds3$conc)),]
ds.mod <- ds.mod[!(is.na(ds.mod$conc)),]

ds.mod$log.conc <- as.vector(log(ds.mod$conc))

ds.mod.m <- melt(ds.mod[,c('date','test','log.conc','hospt_admit')], id.vars=c('date', 'test'))
ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
W1 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))] 
W1 <- apply(W1, 2, function(x){
  x[is.nan(x)] <-NA
  return(x)
} )

scale.W1 <- apply(W1,2, scale)
Y1 <- ds.mod.c[,'hospt_admit_R1, N1']
Y1[is.nan(Y1)] <-NA

W1.scale <- apply(W1,2,scale)

log.offset.y1 <- rep(0, length(Y1))

```


## Association between sewage concentration and epidemiological indicators (Distributed lags)
We can model the epidemiological time series as a distributed lag of the sewage data.

-We evaluate three epidemiological time series: COVID-19 hospitalizations, number of cases of covid-19 (based on date of sample receipt), and number of cases of covid-19 (based on date of report). 

The plots of the distributed lag model show 90% credible intervals

```{r,eval=params$rerun_model, include=F}
mod2 <- mod2.func(W=W1.scale, Y=Y1,log.offset=log.offset.y1)
saveRDS(mod2,'./results/dl.mod2.rds')
```

### Distributed lags for the association between viral RNA in sewage and COVID hospitalizations. 
This shows that lags of 1-4 days are associated with hospitalization. +/-90% Credible Intervals. From the plots,  the sewage sludge RNA concentration increase preceded the increase in hospital admissions.

```{r}
mod2 <- readRDS('./results/dl.mod2.rds')
mod2.hosp.beta <- mod2[c('beta1', 'beta1.cum')]
saveRDS(mod2.hosp.beta,'./results/dl.betas.hosp.rds')

yrange <- range(mod2$beta1)
plot(-1:8, mod2$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
arrows(x0=-1:8, y0=mod2$beta1$lower, y1=mod2$beta1$upper, length=0)
abline(h=0)
```

Cumulative relationship between viral RNA in sewage and hospital admissions. This shows that lags of sewage data of 1-4 days together best correlate with the hospitalization time series (after 4 days, cumulative beta does not increase). +/-90% Credible Intervals.

```{r}
yrange <- range(mod2$beta1.cum)
plot(-1:8, mod2$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
arrows(x0=-1:8, y0=mod2$beta1.cum$lower, y1=mod2$beta1.cum$upper, length=0)
abline(h=0)
```

### Distributed lags for the association between viral RNA in sewage and new reported COVID-19 cases, based on reporting date on the CT DPH website

```{r, eval=T}

#Note data on new cases are missing
ds.mod.m <- melt(ds.mod[,c('date','test','log.conc','new.cases')], id.vars=c('date', 'test'))
ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
W3 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))] 
W3 <- apply(W3, 2, function(x){
  x[is.nan(x)] <-NA
  return(x)
} )
Y3 <- ds.mod.c[,'new.cases_R1, N1']
Y3[is.nan(Y3)] <-NA
#Y3[is.na(Y3)] <- 0

log.offset3.null <- rep(0, length(Y3))

W3.scale <- apply(W3,2, scale)

```


```{r, eval=params$rerun_model, include=F}
mod3 <- mod2.func(W=W3.scale, log.offset=log.offset3.null,Y=Y3)
saveRDS(mod3,'./results/dl.mod3.cases.web.rds')
```

This shows that lags of sewage data at longer time lags (6-8 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals. Note that the lag between reported cases and the sewage data shrinks over time (see time series plots) as testing improves.
```{r, eval=T}
mod3 <- readRDS('./results/dl.mod3.cases.web.rds')
mod2.case.web.beta <- mod3[c('beta1', 'beta1.cum')]
saveRDS(mod2.case.web.beta,'./results/dl.betas.case.web.rds')

yrange <- range(mod3$beta1)
plot(-1:8, mod3$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
arrows(x0=-1:8, y0=mod3$beta1$lower, y1=mod3$beta1$upper, length=0)
abline(h=0)
```


Cumulative relationship between viral RNA in sewage and hospital admissions. This shows that lags of sewage data of 6-8 days together best correlate with the hospitalization time series (after 4 days, cumulative beta does not increase). +/-90% Credible Intervals.

```{r}
yrange <- range(mod3$beta1.cum)
plot(-1:8, mod3$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
arrows(x0=-1:8, y0=mod3$beta1.cum$lower, y1=mod3$beta1.cum$upper, length=0)
abline(h=0)
```


### Distributed lags for the association between viral RNA in sewage and new COVID-19 cases, based on date of report to DPH

```{r, eval=T}

#Note data on new cases are missing
ds.mod.m <- melt(ds.mod[,c('date','test','log.conc','N_tests_reported','N_positive_reported')], id.vars=c('date', 'test'))
ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
W3 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))] 
W3 <- apply(W3, 2, function(x){
  x[is.nan(x)] <-NA
  return(x)
} )
Y3 <- ds.mod.c[,'N_positive_reported_R1, N1']
Y3[is.nan(Y3)] <-NA
#Y3[is.na(Y3)] <- 0

log.offset3 <- log(ds.mod.c[,'N_tests_reported_R1, N1'])
log.offset3[is.nan(log.offset3)] <-0
log.offset3[is.na(log.offset3)] <-0

log.offset3.null <- rep(0, length(log.offset3))

W3.scale <- apply(W3,2, scale)

```


```{r, eval=params$rerun_model, include=F}
mod3 <- mod2.func(W=W3.scale, log.offset=log.offset3,Y=Y3)
saveRDS(mod3,'./results/dl.mod3.cases.rds')
```

This shows that lags of sewage data at longer time lags (0-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals. Note that the lag between reported cases and the sewage data shrinks over time (see time series plots) as testing improves.
```{r, eval=T}
mod3 <- readRDS('./results/dl.mod3.cases.rds')
mod3.betas <- mod3[c('beta1','beta1.cum')]
saveRDS(mod3.betas,'./results/dl.mod3.betas.report.to.dph.denom.rds')

plot(-1:8, mod3$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
yrange <- range(mod3$beta1)
arrows(x0=-1:8, y0=mod3$beta1$lower, y1=mod3$beta1$upper, length=0)
abline(h=0)
```

Cumulative relationship between viral RNA in sewage and new cases. This shows that lags of sewage data at longer time lags (0-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals.

```{r, eval=T}
yrange <- range(mod3$beta1.cum)
plot(-1:8, mod3$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
arrows(x0=-1:8, y0=mod3$beta1.cum$lower, y1=mod3$beta1.cum$upper, length=0)
abline(h=0)
```

We can also repeat this analysis without adjusting for testing volume

```{r, eval=params$rerun_model, include=F}
mod3.no.offset <- 
  mod2.func(W=W3.scale, log.offset=log.offset3.null,Y=Y3)

saveRDS(mod3.no.offset,'./results/dl.mod3.cases.no.offset.rds')
```


This shows largely the same pattern as when adjusting for testing volume. (1-4 day lags) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals.
```{r, eval=T}
mod3.no.offset <-
  readRDS('./results/dl.mod3.cases.no.offset.rds')

mod3.betas.no.offset <- mod3.no.offset[c('beta1','beta1.cum')]
saveRDS(mod3.betas.no.offset,'./results/dl.mod3.betas.report.to.dph.no.offset.rds')

yrange <- range(mod3.no.offset$beta1)

plot(-1:8, mod3.no.offset$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
arrows(x0=-1:8, y0=mod3.no.offset$beta1$lower, y1=mod3.no.offset$beta1$upper, length=0)
abline(h=0)
```
Cumulative relationship between viral RNA in sewage and new cases (not adjusting for testing volume). This shows that lags of sewage data at longer time lags (1-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals.
```{r, eval=T}
yrange <- range(mod3.no.offset$beta1.cum)
plot(-1:8, mod3.no.offset$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
arrows(x0=-1:8, y0=mod3.no.offset$beta1.cum$lower, y1=mod3.no.offset$beta1.cum$upper, length=0)
abline(h=0)
```

### Distributed lags for the association between viral RNA in sewage and number or percent positive, based on date when the sample was collected. 

```{r, eval=T}
#Note data on new cases are missing
#ds.mod.m <- melt(ds.mod[!is.na(ds.mod$new.cases),c('date','test','log.conc','new.cases')], id.vars=c('date', 'test'))
ds.mod.m <- melt(ds.mod[,c('date','test','log.conc',"N_positive_recieved", "N_tests_received" )], id.vars=c('date', 'test'))
ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
W2 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))] 
W2 <- apply(W2, 2, function(x){
 x[is.nan(x)] <-NA
 return(x)
} )
scale.W2 <- apply(W2, 2, scale)
Y2 <- ds.mod.c[,"N_positive_recieved_R1, N1"]
Y2[is.nan(Y2)] <-NA

log.offset2 <- log(ds.mod.c[,"N_tests_received_R1, N1"])
log.offset2[is.nan(log.offset2)] <-NA
log.offset2[is.na(log.offset2)] <-0

#log.offset2 <- rep(0,length(Y2))

log.offset2.null <- rep(0, length(log.offset2))

W2.scale <- apply(W2,2, scale)
```

```{r, eval=params$rerun_model, include=F}
mod2 <- mod2.func(W=W2.scale, Y=Y2, log.offset=log.offset2)
saveRDS(mod2, './results/dl.mod2.pct.pos.rds')

mod2.no.offset <- mod2.func(W=W2.scale, Y=Y2, log.offset=log.offset2.null)
saveRDS(mod2.no.offset, './results/dl.mod2.N.pos.specdate.rds')
```

This shows that lags of 0-2 days are associated with the case time series, based on date of test, (on the cumulative plot, beta does not increase after 2 days) +/-90% Credible Intervals. From the plots, the sewage sludge RNA concentration increase preceded the increase in number of cases, based on sample collection date.

```{r, eval=T}
mod2 <- readRDS('./results/dl.mod2.pct.pos.rds')
mod2.betas <- mod2[c('beta1', 'beta1.cum')]
saveRDS(mod2.betas,'./results/dl.mod2.betas.sample.date.pct.post.rds')

yrange <- range(mod2$beta1)
plot(-1:8, mod2$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
arrows(x0=-1:8, y0=mod2$beta1$lower, y1=mod2$beta1$upper, length=0)
abline(h=0)
```

Cumulative relationship between viral RNA in sewage and percent positive based on sample collection date. +/-90% Credible Intervals.

```{r, eval=T}
yrange <- range(mod2$beta1.cum)
plot(-1:8, mod2$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
arrows(x0=-1:8, y0=mod2$beta1.cum$lower, y1=mod2$beta1.cum$upper, length=0)
abline(h=0)
```


and same thing with no offset term

```{r, eval=T}
mod2.no.offset <- readRDS('./results/dl.mod2.N.pos.specdate.rds')

mod2.betas.no.offset <- mod2.no.offset[c('beta1', 'beta1.cum')]
saveRDS(mod2.betas.no.offset,'./results/dl.mod2.betas.sample.date.N.pos.rds')

yrange <- range(mod2.no.offset$beta1)
plot(-1:8, mod2.no.offset$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
arrows(x0=-1:8, y0=mod2.no.offset$beta1$lower, y1=mod2.no.offset$beta1$upper, length=0)
abline(h=0)
```


Cumulative relationship between viral RNA in sewage and number of cases based on sample collection date. +/-90% Credible Intervals.

```{r, eval=T}
yrange <- range(mod2.no.offset$beta1.cum)
plot(-1:8, mod2.no.offset$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
arrows(x0=-1:8, y0=mod2.no.offset$beta1.cum$lower, y1=mod2.no.offset$beta1.cum$upper, length=0)
abline(h=0)
```

## Association between sewage concentration and epidemiological indicators (Individual lags)
-All of the plots of individual lags show 98.75% credible intervals, which represent an alpha of 0.1, adjusted for multiple testing.

### Association of sewage viral concentration at different lags with hospitalizations  

Test lags of 1-7 days, and leads of 1 days one at a time.

```{r, include=F, eval=params$rerun_model}
mods.hosp <- pblapply(c(-1:7), function(zz) mod2.indiv.lag.func( Y=Y1, log.offset=log.offset.y1, W=W1.scale, lag.n=zz))
saveRDS(mods.hosp,'./results/mods.hosp.rds')
```
Rate ratio (+/- 98.75% Credible intervals)  showing the relative increase in positive tests associated with a 1-log increase of sewage RNA concentration at various lags (based on date of test). This shows the strongest association with a lag of 1 day, with the effect trailing off with longer or shorter lags. From the plots,  the sewage sludge RNA concentration increase preceded the increase in hospital admissions. 

```{r}
mods.hosp <-
  readRDS('./results/mods.hosp.rds')

#Extract the betas
beta.hospt <- lapply(mods.hosp,'[[','beta1')
beta.hospt <- do.call('rbind.data.frame',beta.hospt)
irr.hospt <- exp(beta.hospt)
yrange <- range(irr.hospt)
plot(-1:7, irr.hospt[,"post_means"], ylim=yrange, bty='l', ylab='Rate ratio', xlab='Lag (days)' , pch=16)
arrows(x0=-1:7, y0=irr.hospt[,"lower"], y1=irr.hospt[,"upper"], length=0)
abline(h=1, lty=2, col=rgb(0,0,0,alpha=0.1))
```

### Association of sewage viral concentration at different lags with new cases (defined by date of test)

```{r, eval=T}

#Note data on new cases are missing
#ds.mod.m <- melt(ds.mod[!is.na(ds.mod$new.cases),c('date','test','log.conc','new.cases')], id.vars=c('date', 'test'))
ds.mod.m <- melt(ds.mod[,c('date','test','log.conc',"N_positive_recieved", "N_tests_received" )], id.vars=c('date', 'test'))
ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
W2 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))] 
W2 <- apply(W2, 2, function(x){
  x[is.nan(x)] <-NA
  return(x)
} )
scale.W2 <- apply(W2, 2, scale)
Y2 <- ds.mod.c[,"N_positive_recieved_R1, N1"]
Y2[is.nan(Y2)] <-NA

log.offset2 <- log(ds.mod.c[,"N_tests_received_R1, N1"])
log.offset2[is.nan(log.offset2)] <-NA
log.offset2[is.na(log.offset2)] <-0
W2.scale <- apply(W2,2, scale)

```

```{r, eval=T, include=F, eval=params$rerun_model}
mods.cases.dx <- pblapply(c(-1:7), function(zz) mod2.indiv.lag.func( Y=Y2, log.offset=log.offset2, W=W2.scale, lag.n=zz))
saveRDS(mods.cases.dx,'./results/mods.cases.dx.rds')

```
Rate ratio (+/- 98.75% Credible) showing the relative increase in positive tests associated with a 1-log increase of sewage RNA concentration at various lags (based on date of test). This shows the strongest association with a lag of 4 days, with uncertainty in the estimates and in the length of the lag. From the plots,  the sewage sludge RNA concentration slightly preceded the increase in the cases (based on sample collection date).

```{r}
mods.cases.dx <-
  readRDS('./results/mods.cases.dx.rds')


#Extract the betas
beta.dx <- lapply(mods.cases.dx,'[[','beta1')
beta.dx <- do.call('rbind.data.frame',beta.dx)
irr.dx <- exp(beta.dx)

yrange <- range(irr.dx)
plot(-1:7, irr.dx[,"post_means"], ylim=yrange, bty='l', ylab='Rate ratio', xlab='Lag (days)' , pch=16)
arrows(x0=-1:7, y0=irr.dx[,"lower"], y1=irr.dx[,"upper"], length=0)
abline(h=1, lty=2, col='gray')
```

### Association of sewage viral concentration at different lags with new cases (defined by date of report)


```{r, eval=T, include=F, eval=params$rerun_model}
mods.cases.rep <- pblapply(c(-1:7), function(zz) mod2.indiv.lag.func( Y=Y3, log.offset=log.offset3, W=W3.scale, lag.n=zz))
saveRDS(mods.cases.rep,'./results/mods.cases.rep.rds')

```

Rate ratio (+/- 98.75% Credible intervals) showing the relative increase in reported cases associated with a 1-log increase of sewage RNA concentration at various lags.  Sewage sludge RNA  increases first, then reported cases lags 1 or 6 days later, with considerable uncertainty in the estimates and in the length of the lag.  From the plots,  the sewage sludge RNA concentration increase preceded the increase in reported cases
```{r}
mods.cases.rep <-
  readRDS('./results/mods.cases.rep.rds')

#Extract the betas
beta.rep <- lapply(mods.cases.rep,'[[','beta1')
beta.rep <- do.call('rbind.data.frame',beta.rep)
irr.rep <- exp(beta.rep)

yrange <- range(irr.rep)
plot(-1:7, (irr.rep[,"post_means"]), ylim=yrange, bty='l', ylab='Rate ratio', xlab='Lag (days)' , pch=16)
arrows(x0=-1:7, y0=irr.rep[,"lower"], y1=irr.rep[,"upper"], length=0)
abline(h=1, lty=2, col='gray')
```