diff --git a/docs/index.html b/docs/index.html
index d22eac7..3a10720 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -1528,7 +1528,7 @@
Explanation of the data and analyses
Plots of raw data
-
This vertical lines are spaced 7 days apart. The increase in sewage is apparent earlier than the increase in the other indicators. This is particularly apparent on the log scale.
+
This vertical lines are spaced 7 days apart. The increase in sewage is apparent earlier than the increase in the other indicators. This is particularly apparent on the log scale.
Comparing the new cases of COVID-19 based on sample receipt date or date of report, we can see a long lag early in the epidemic between sample collection date and date of report to DPH. This gap seems to narrow as the epidemic progresses, perhaps due to improvements in testing
Comparing the new cases of COVID-19 based on date of report to DPH (black) and when the data were reported on the DPH website (red).
@@ -1550,27 +1550,24 @@ Distributed lags for the association between viral RNA in sewage and new rep
Cumulative relationship between viral RNA in sewage and hospital admissions. This shows that lags of sewage data of 6-8 days together best correlate with the hospitalization time series (after 4 days, cumulative beta does not increase). +/-90% Credible Intervals.
-
-
Distributed lags for the association between viral RNA in sewage and COVID hospitalizations.
-
This shows that lags of 1-4 days are associated with hospitalization. +/-90% Credible Intervals. From the plots, the sewage sludge RNA concentration increase preceded the increase in hospital admissions.
-
-
Cumulative relationship between viral RNA in sewage and hospital admissions. This shows that lags of sewage data of 1-4 days together best correlate with the hospitalization time series (after 4 days, cumulative beta does not increase). +/-90% Credible Intervals.
-
-
Distributed lags for the association between viral RNA in sewage and new COVID-19 cases, based on date of report to DPH
-
This shows that lags of sewage data at longer time lags (0-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals. Note that the lag between reported cases and the sewage data shrinks over time (see time series plots) as testing improves.
+
This shows that lags of sewage data at longer time lags (0-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals. Note that the lag between reported cases and the sewage data shrinks over time (see time series plots) as testing improves.
Cumulative relationship between viral RNA in sewage and new cases. This shows that lags of sewage data at longer time lags (0-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals.
We can also repeat this analysis without adjusting for testing volume
This shows largely the same pattern as when adjusting for testing volume. (1-4 day lags) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals. Cumulative relationship between viral RNA in sewage and new cases (not adjusting for testing volume). This shows that lags of sewage data at longer time lags (1-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals.
-
-
Distributed lags for the association between viral RNA in sewage and number positive, based on date when the sample was collected.
+
+
Distributed lags for the association between viral RNA in sewage and number or percent positive, based on date when the sample was collected.
This shows that lags of 0-2 days are associated with the case time series, based on date of test, (on the cumulative plot, beta does not increase after 2 days) +/-90% Credible Intervals. From the plots, the sewage sludge RNA concentration increase preceded the increase in number of cases, based on sample collection date.
-
Cumulative relationship between viral RNA in sewage and number of cases based on sample collection date. +/-90% Credible Intervals.
+
Cumulative relationship between viral RNA in sewage and percent positive based on sample collection date. +/-90% Credible Intervals.
+
and same thing with no offset term
+
+
Cumulative relationship between viral RNA in sewage and number of cases based on sample collection date. +/-90% Credible Intervals.
+
diff --git a/sewage through May.Rmd b/sewage through May.Rmd
index 80a1fe9..4808e15 100644
--- a/sewage through May.Rmd
+++ b/sewage through May.Rmd
@@ -435,7 +435,7 @@ arrows(x0=-1:8, y0=mod3.no.offset$beta1.cum$lower, y1=mod3.no.offset$beta1.cum$u
abline(h=0)
```
-### Distributed lags for the association between viral RNA in sewage and number positive, based on date when the sample was collected.
+### Distributed lags for the association between viral RNA in sewage and number or percent positive, based on date when the sample was collected.
```{r, eval=T}
#Note data on new cases are missing
@@ -457,12 +457,17 @@ log.offset2[is.na(log.offset2)] <-0
#log.offset2 <- rep(0,length(Y2))
+log.offset2.null <- rep(0, length(log.offset2))
+
W2.scale <- apply(W2,2, scale)
```
```{r, eval=params$rerun_model, include=F}
mod2 <- mod2.func(W=W2.scale, Y=Y2, log.offset=log.offset2)
saveRDS(mod2, './results/dl.mod2.pct.pos.rds')
+
+mod2.no.offset <- mod2.func(W=W2.scale, Y=Y2, log.offset=log.offset2.null)
+saveRDS(mod2.no.offset, './results/dl.mod2.N.pos.specdate.rds')
```
This shows that lags of 0-2 days are associated with the case time series, based on date of test, (on the cumulative plot, beta does not increase after 2 days) +/-90% Credible Intervals. From the plots, the sewage sludge RNA concentration increase preceded the increase in number of cases, based on sample collection date.
@@ -478,7 +483,7 @@ arrows(x0=-1:8, y0=mod2$beta1$lower, y1=mod2$beta1$upper, length=0)
abline(h=0)
```
-Cumulative relationship between viral RNA in sewage and number of cases based on sample collection date. +/-90% Credible Intervals.
+Cumulative relationship between viral RNA in sewage and percent positive based on sample collection date. +/-90% Credible Intervals.
```{r, eval=T}
yrange <- range(mod2$beta1.cum)
@@ -488,6 +493,30 @@ abline(h=0)
```
+and same thing with no offset term
+
+```{r, eval=T}
+mod2.no.offset <- readRDS('./results/dl.mod2.N.pos.specdate.rds')
+
+mod2.betas.no.offset <- mod2.no.offset[c('beta1', 'beta1.cum')]
+saveRDS(mod2.betas.no.offset,'./results/dl.mod2.betas.sample.date.N.pos.rds')
+
+yrange <- range(mod2.no.offset$beta1)
+plot(-1:8, mod2.no.offset$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
+arrows(x0=-1:8, y0=mod2.no.offset$beta1$lower, y1=mod2.no.offset$beta1$upper, length=0)
+abline(h=0)
+```
+
+
+Cumulative relationship between viral RNA in sewage and number of cases based on sample collection date. +/-90% Credible Intervals.
+
+```{r, eval=T}
+yrange <- range(mod2.no.offset$beta1.cum)
+plot(-1:8, mod2.no.offset$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
+arrows(x0=-1:8, y0=mod2.no.offset$beta1.cum$lower, y1=mod2.no.offset$beta1.cum$upper, length=0)
+abline(h=0)
+```
+
## Association between sewage concentration and epidemiological indicators (Individual lags)
-All of the plots of individual lags show 98.75% credible intervals, which represent an alpha of 0.1, adjusted for multiple testing.
diff --git a/sewage-through-May.html b/sewage-through-May.html
index d22eac7..3a10720 100644
--- a/sewage-through-May.html
+++ b/sewage-through-May.html
@@ -1528,7 +1528,7 @@
Explanation of the data and analyses
Plots of raw data
-
This vertical lines are spaced 7 days apart. The increase in sewage is apparent earlier than the increase in the other indicators. This is particularly apparent on the log scale.
+
This vertical lines are spaced 7 days apart. The increase in sewage is apparent earlier than the increase in the other indicators. This is particularly apparent on the log scale.
Comparing the new cases of COVID-19 based on sample receipt date or date of report, we can see a long lag early in the epidemic between sample collection date and date of report to DPH. This gap seems to narrow as the epidemic progresses, perhaps due to improvements in testing
Comparing the new cases of COVID-19 based on date of report to DPH (black) and when the data were reported on the DPH website (red).
@@ -1550,27 +1550,24 @@ Distributed lags for the association between viral RNA in sewage and new rep
Cumulative relationship between viral RNA in sewage and hospital admissions. This shows that lags of sewage data of 6-8 days together best correlate with the hospitalization time series (after 4 days, cumulative beta does not increase). +/-90% Credible Intervals.
-
-
Distributed lags for the association between viral RNA in sewage and COVID hospitalizations.
-
This shows that lags of 1-4 days are associated with hospitalization. +/-90% Credible Intervals. From the plots, the sewage sludge RNA concentration increase preceded the increase in hospital admissions.
-
-
Cumulative relationship between viral RNA in sewage and hospital admissions. This shows that lags of sewage data of 1-4 days together best correlate with the hospitalization time series (after 4 days, cumulative beta does not increase). +/-90% Credible Intervals.
-
-
Distributed lags for the association between viral RNA in sewage and new COVID-19 cases, based on date of report to DPH
-
This shows that lags of sewage data at longer time lags (0-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals. Note that the lag between reported cases and the sewage data shrinks over time (see time series plots) as testing improves.
+
This shows that lags of sewage data at longer time lags (0-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals. Note that the lag between reported cases and the sewage data shrinks over time (see time series plots) as testing improves.
Cumulative relationship between viral RNA in sewage and new cases. This shows that lags of sewage data at longer time lags (0-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals.
We can also repeat this analysis without adjusting for testing volume
This shows largely the same pattern as when adjusting for testing volume. (1-4 day lags) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals. Cumulative relationship between viral RNA in sewage and new cases (not adjusting for testing volume). This shows that lags of sewage data at longer time lags (1-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals.
-
-
Distributed lags for the association between viral RNA in sewage and number positive, based on date when the sample was collected.
+
+
Distributed lags for the association between viral RNA in sewage and number or percent positive, based on date when the sample was collected.
This shows that lags of 0-2 days are associated with the case time series, based on date of test, (on the cumulative plot, beta does not increase after 2 days) +/-90% Credible Intervals. From the plots, the sewage sludge RNA concentration increase preceded the increase in number of cases, based on sample collection date.
-
Cumulative relationship between viral RNA in sewage and number of cases based on sample collection date. +/-90% Credible Intervals.
+
Cumulative relationship between viral RNA in sewage and percent positive based on sample collection date. +/-90% Credible Intervals.
+
and same thing with no offset term
+
+
Cumulative relationship between viral RNA in sewage and number of cases based on sample collection date. +/-90% Credible Intervals.
+
diff --git a/sewage.Rmd b/sewage.Rmd
deleted file mode 100644
index 7b3b6d7..0000000
--- a/sewage.Rmd
+++ /dev/null
@@ -1,417 +0,0 @@
----
-title: "Coronavirus in New Haven Sewage"
-author: "Additional analyses for Peccia et al."
-date: "6/1/2020"
-output:
- html_document:
- toc: true
- toc_depth: 3
- toc_float: true
- pdf_document: default
-params:
- rerun_model: FALSE
----
-
-```{r setup, include=FALSE}
-knitr::opts_chunk$set(echo = FALSE)
-library(readxl)
-library(reshape2)
-library(HDInterval)
-library(rjags)
-library(xtable)
-library(sas7bdat)
-library(pbapply)
-
-source('./Models/mod1.deriv.ar1.R')
-source('./Models/mod2.dist.lag.alt2.R')
-source('./Models/mod2.indiv.lags.R')
-
-
-```
-
-## Overview
-These are additional analyses of data examining increases in SARS-CoV-2 RNA concentrations in primary sewage sludge in relation to COVID-19 hospitalizations and cases. The original preprint, led by Jordan Peccia, et al can be found here: https://www.medrxiv.org/content/10.1101/2020.05.19.20105999v1
-The original pre-print included a correlation of smoothed versions of the viral RNA data and the epidemiological indicators. In the analyses presented here, we use an error-in-variables time series model to get an estimate of the underlying viral dynamics in sewage and the relationship of this with hospitalizations. This analysis is carried out in the Bayesian framework, allowing us to correctly quantify uncertainty in the estimated associations. These additional analyses were performed by Dan Weinberger (Epidemiology of Microbial Diseases, Yale School of Public Health), with input from Josh Warren (Biostatistics, Yale School of Public Health) and the rest of the original study team.
-
-
-## Explanation of the data and analyses
-
-We have up to 4 measurements of viral RNA in sewage sludge daily (2 targets, tested in duplicate), the number of COVID-19 hospitalization per day in the catchment area, the number of new reported cases on that day, and the proportion of tests that were positive for coronavirus on each day. The epi curves won't nedcesarily align because the test data is based on the when the test is performed, while the case data is based on when cases were reported. There will be a lag between these two measures, and this lag was probably longer in the early part of the epidemic.
-
-In the analyses, we assume that the observed sewage testing data (W) are drawn from an underlying, unobserved trajectory of viral concentration in the sewage, which follows an AR(1) process (X). We then evaluate the association between X and the number of hospitalization/cases/positive tests in Poisson regression models. This count model includes an AR(1) random effect for time. The models are fit using JAGS (see Models/mod2.indiv.lags.R). We test lags of the X variable of 0-7 days in a distributed lags framework. We also present results from each lag tested individually for comparison.
-
-
-```{r, eval=F, include=F}
-## Combine data (only need to do this once)
-
-ds1 <- read_excel('./Data/raw RNA data.xlsx')
-ds1.m <- melt(ds1, id.vars='DATE')
-names(ds1.m) <- c('date', 'test','conc')
-
-ds1 <- read_excel('./Data/data through June 1.xlsx')
-ds1.m <- melt(ds1, id.vars='DATE')
-names(ds1.m) <- c('date', 'conc', , 'test')
-
-
-
-ds2 <- read.sas7bdat('./Data/dw1stpostest.sas7bdat')
-ds2$date <-format(as.Date(ds2$newspecdate, origin="1960-01-01"),"%Y-%m-%d")
-
-tests <- read.sas7bdat('./Data/dw1stcovtest.sas7bdat')
-tests$date <-format(as.Date(tests$newspecdate, origin="1960-01-01"),"%Y-%m-%d")
-
-ds2.nh <- ds2[ds2$OfficialCity %in% c('NEW HAVEN', 'EAST HAVEN','HAMDEN', 'WOODBRIDGE'),]
-ds2.nh$pos.tests <-1
-ds2.nh.agg <- aggregate(ds2.nh[,'pos.tests', drop=F],by=list('date'=ds2.nh$date), FUN=sum)
-ds2.nh.agg$date <- as.Date(ds2.nh.agg$date)
-
-tests.nh <- tests[tests$OfficialCity %in% c('NEW HAVEN', 'EAST HAVEN','HAMDEN', 'WOODBRIDGE'),]
-tests.nh$cov.tests <-1
-tests.nh.agg <- aggregate(tests.nh[,'cov.tests', drop=F],by=list('date'=tests.nh$date), FUN=sum)
-tests.nh.agg$date <- as.Date(tests.nh.agg$date)
-
-
-ds.hosp <- read_excel('./Data/epi.data.xlsx')
-ds.hosp$new.cases <-NULL
-ds.hosp$date <- as.Date(ds.hosp$date)
-ds1.m$date <- as.Date(ds1.m$date)
-
-ds.cases <- read_excel('./Data/updated case data by reporting date.xlsx')
-names(ds.cases) <- c('date','new.cases')
-ds.cases$date <- as.Date(ds.cases$date)
-
-ds3 <- merge(ds2.nh.agg, ds1.m, by='date', all=T)
-ds3 <- merge(ds3, tests.nh.agg, by='date', all=T)
-ds3 <- merge(ds3, ds.hosp, by='date', all=T)
-ds3 <- merge(ds3, ds.cases, by='date', all=T)
-
-ds3$prop.pos <- ds3$pos.tests/ds3$cov.tests
-#plot(ds3$pos.tests)
-#points(ds3$new.pos.tests)
-
-ds3$conc <- as.numeric(ds3$conc)
-ds3$target <- 'n1'
-ds3$target[grep('n1', ds3$test)] <- 'n2'
-
-write.csv(ds3,'./Data/combined.data.csv')
-```
-
-
-
-
-
-```{r}
-## read in combined data
-
-ds3 <- read.csv('./Data/combined.data.csv')
-ds3$date <- as.Date(ds3$date)
-```
-
-
-## Plots of raw data
-
-This vertical lines are spaced 7 days apart. The increase in sewage is apparent earlier than the increase in the other indicators. This is particularly apparent on the log scale.
-```{r, fig.width=8, fig.height=13}
-par(mfrow=c(4,2))
-plot(ds3$date, (ds3$hospt_admit), type='l', bty='l', main='Hospitalizations', ylab=('Log(Hospitalizations (N))'))
-polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
-abline(v=as.Date(c('2020-03-17','2020-03-24','2020-03-31','2020-04-07','2020-04-14','2020-04-21','2020-04-28')), lty=2, col='gray')
-
-plot(ds3$date, log(ds3$hospt_admit), type='l', bty='l', main='Log Hospitalizations', ylab=('log(Hospitalizations (N))') )
-polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
-abline(v=as.Date(c('2020-03-17','2020-03-24','2020-03-31','2020-04-07','2020-04-14','2020-04-21','2020-04-28')), lty=2, col='gray')
-
-
-# plot(ds3$date, ds3$new.cases, type='l', main='New reported cases')
-
-plot(ds3$date, (ds3$prop.pos), main='Proportion of human tests positive', xlab='Date', ylab='Proportion positive', bty='l' , type='l')
-polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
-abline(v=as.Date(c('2020-03-17','2020-03-24','2020-03-31','2020-04-07','2020-04-14','2020-04-21','2020-04-28')), lty=2, col='gray')
-
-plot(ds3$date, log(ds3$prop.pos/(1-ds3$prop.pos)), main='Logit(Proportion of human tests positive)', xlab='Date', ylab='Logit Proportion positive', bty='l', type='l')
-polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
-abline(v=as.Date(c('2020-03-17','2020-03-24','2020-03-31','2020-04-07','2020-04-14','2020-04-21','2020-04-28')), lty=2, col='gray')
-
-
-
-plot(ds3$date, (ds3$new.cases), main='New reported cases', xlab='Date', ylab='Log (Reported cases)', bty='l', type='l')
-polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
-abline(v=as.Date(c('2020-03-17','2020-03-24','2020-03-31','2020-04-07','2020-04-14','2020-04-21','2020-04-28')), lty=2, col='gray')
-
-
-plot(ds3$date, log(ds3$new.cases), main='log(New reported cases)', type='l', xlab='Date', ylab='Log Reported cases', bty='l')
-polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
-abline(v=as.Date(c('2020-03-17','2020-03-24','2020-03-31','2020-04-07','2020-04-14','2020-04-21','2020-04-28')), lty=2, col='gray')
-
-plot(ds3$date, (ds3$conc), main='Viral RNA in Sewage', col=as.factor(ds3$target), xlab='Date', ylab=' Concentration', bty='l')
-polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
-abline(v=as.Date(c('2020-03-17','2020-03-24','2020-03-31','2020-04-07','2020-04-14','2020-04-21','2020-04-28')), lty=2, col='gray')
-
-
-plot(ds3$date, log(ds3$conc), main='Viral RNA in Sewage', col=as.factor(ds3$target), xlab='Date', ylab='Log Concentration', bty='l')
-polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
-abline(v=as.Date(c('2020-03-17','2020-03-24','2020-03-31','2020-04-07','2020-04-14','2020-04-21','2020-04-28')), lty=2, col='gray')
-
-
-```
-
-
-```{r}
-## Prepare data for JAGS
-
-ds.mod <- ds3[!(is.nan(ds3$conc)),]
-ds.mod <- ds.mod[!(is.na(ds.mod$conc)),]
-
-ds.mod$log.conc <- as.vector(log(ds.mod$conc))
-
-ds.mod.m <- melt(ds.mod[,c('date','test','log.conc','hospt_admit')], id.vars=c('date', 'test'))
-ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
-W1 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))]
-W1 <- apply(W1, 2, function(x){
- x[is.nan(x)] <-NA
- return(x)
-} )
-
-scale.W1 <- apply(W1,2, scale)
-Y1 <- ds.mod.c[,'hospt_admit_rep1_n1']
-Y1[is.nan(Y1)] <-NA
-
-W1.scale <- apply(W1,2,scale)
-
-log.offset.y1 <- rep(0, length(Y1))
-
-```
-
-
-## Association between sewage concentration and epidemiological indicators (Distributed lags)
-We can model the epidemiological time series as a distributed lag of the sewage data.
-
--We evaluate three epidemiological time series: COVID-19 hospitalizations, proportion of tests that are positive for coronavirus (based on date of test), and number of cases of covid-19 (based on date of report).
-
-The plots of the distributed lag model show 90% credible intervals
-
-```{r,eval=params$rerun_model, include=F}
-mod2 <- mod2.func(W=W1.scale, Y=Y1,log.offset=log.offset.y1)
-saveRDS(mod2,'./results/dl.mod2.rds')
-```
-
-### Distributed lags for the association between viral RNA in sewage and COVID hospitalizations.
-This shows that lags of 1-4 days are associated with hospitalization. +/-90% Credible Intervals. From the plots, the sewage sludge RNA concentration increase preceded the increase in hospital admissions.
-
-```{r}
-mod2 <- readRDS('./results/dl.mod2.rds')
-
-yrange <- range(mod2$beta1)
-plot(0:7, mod2$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
-arrows(x0=0:7, y0=mod2$beta1$lower, y1=mod2$beta1$upper, length=0)
-abline(h=0)
-```
-
-Cumulative relationship between viral RNA in sewage and hospital admissions. This shows that lags of sewage data of 1-4 days together best correlate with the hospitalization time series (after 4 days, cumulative beta does not increase). +/-90% Credible Intervals.
-
-```{r}
-yrange <- range(mod2$beta1.cum)
-plot(0:7, mod2$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
-arrows(x0=0:7, y0=mod2$beta1.cum$lower, y1=mod2$beta1.cum$upper, length=0)
-abline(h=0)
-```
-
-
-### Distributed lags for the association between viral RNA in sewage and proportion positive.
-
-```{r, eval=T}
-#Note data on new cases are missing
-#ds.mod.m <- melt(ds.mod[!is.na(ds.mod$new.cases),c('date','test','log.conc','new.cases')], id.vars=c('date', 'test'))
-ds.mod.m <- melt(ds.mod[,c('date','test','log.conc','pos.tests', 'cov.tests')], id.vars=c('date', 'test'))
-ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
-W2 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))]
-W2 <- apply(W2, 2, function(x){
- x[is.nan(x)] <-NA
- return(x)
-} )
-scale.W2 <- apply(W2, 2, scale)
-Y2 <- ds.mod.c[,'pos.tests_rep1_n1']
-Y2[is.nan(Y2)] <-NA
-log.offset2 <- log(ds.mod.c[,'cov.tests_rep1_n1'])
-log.offset2[is.nan(log.offset2)] <-NA
-log.offset2[is.na(log.offset2)] <-0
-W2.scale <- apply(W2,2, scale)
-```
-
-```{r, eval=params$rerun_model, include=F}
-mod2 <- mod2.func(W=W2.scale, log.offset=log.offset2,Y=Y2)
-saveRDS(mod2, './results/dl.mod2.pct.pos.rds')
-```
-
-This shows that lags of 0-6 days are associated with the proportion positive time series (on the cumulative plot, beta does not increase after 6 days) +/-90% Credible Intervals. From the plots, the sewage sludge RNA concentration increase preceded the increase in proportion positive.
-
-```{r, eval=T}
-mod2 <- readRDS('./results/dl.mod2.pct.pos.rds')
-
-yrange <- range(mod2$beta1)
-plot(0:7, mod2$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
-arrows(x0=0:7, y0=mod2$beta1$lower, y1=mod2$beta1$upper, length=0)
-abline(h=0)
-```
-
-Cumulative relationship between viral RNA in sewage and proportion positive. +/-90% Credible Intervals.
-
-```{r, eval=T}
-yrange <- range(mod2$beta1.cum)
-plot(0:7, mod2$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
-arrows(x0=0:7, y0=mod2$beta1.cum$lower, y1=mod2$beta1.cum$upper, length=0)
-abline(h=0)
-```
-
-### Distributed lags for the association between viral RNA in sewage and new reported COVID-19 cases
-
-```{r, eval=T}
-
-#Note data on new cases are missing
-ds.mod.m <- melt(ds.mod[,c('date','test','log.conc','new.cases')], id.vars=c('date', 'test'))
-ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
-W3 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))]
-W3 <- apply(W3, 2, function(x){
- x[is.nan(x)] <-NA
- return(x)
-} )
-Y3 <- ds.mod.c[,'new.cases_rep1_n1']
-Y3[is.nan(Y3)] <-NA
-
-log.offset3 <- rep(1, length(Y3))
-W3.scale <- apply(W3,2, scale)
-
-```
-```{r, eval=params$rerun_model, include=F}
-mod3 <- mod2.func(W=W3.scale, log.offset=log.offset3,Y=Y3)
-mod3 <- saveRDS(mod3,'./results/dl.mod3.cases.rds')
-```
-
-This shows that lags of sewage data at longer time lags (6-7 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals.
-```{r, eval=T}
-mod3 <- readRDS('./results/dl.mod3.cases.rds')
-
-yrange <- range(mod3$beta1)
-plot(0:7, mod3$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
-arrows(x0=0:7, y0=mod3$beta1$lower, y1=mod3$beta1$upper, length=0)
-abline(h=0)
-```
-
-Cumulative relationship between viral RNA in sewage and new cases. This shows that lags of sewage data at longer time lags (6-7 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals.
-```{r, eval=T}
-yrange <- range(mod3$beta1.cum)
-plot(0:7, mod3$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
-arrows(x0=0:7, y0=mod3$beta1.cum$lower, y1=mod3$beta1.cum$upper, length=0)
-abline(h=0)
-```
-
-
-## Association between sewage concentration and epidemiological indicators (Individual lags)
--All of the plots of individual lags show 98.75% credible intervals, which represent an alpha of 0.1, adjusted for multiple testing.
-
-### Association of sewage viral concentration at different lags with hospitalizations
-
-Test lags of 0-7 days, one at a time.
-
-```{r, include=F, eval=params$rerun_model}
-mods.hosp <- pblapply(c(0:7), function(zz) mod2.indiv.lag.func( Y=Y1, log.offset=log.offset.y1, W=W1.scale, lag.n=zz))
-saveRDS(mods.hosp,'./results/mods.hosp.rds')
-```
-Rate ratio (+/- 98.75% Credible intervals) showing the relative increase in positive tests associated with a 1-log increase of sewage RNA concentration at various lags (based on date of test). This shows the strongest association with a lag of 1 day, with the effect trailing off with longer or shorter lags. From the plots, the sewage sludge RNA concentration increase preceded the increase in hospital admissions.
-
-```{r}
-mods.hosp <-
- readRDS('./results/mods.hosp.rds')
-#Extract the betas
-beta.hospt <- lapply(mods.hosp,'[[','beta1')
-beta.hospt <- do.call('rbind.data.frame',beta.hospt)
-irr.hospt <- exp(beta.hospt)
-yrange <- range(irr.hospt)
-plot(0:7, irr.hospt[,"post_means"], ylim=yrange, bty='l', ylab='Rate ratio', xlab='Lag (days)' , pch=16)
-arrows(x0=0:7, y0=irr.hospt[,"lower"], y1=irr.hospt[,"upper"], length=0)
-abline(h=1, lty=2, col='gray')
-```
-
-### Association of sewage viral concentration at different lags with new cases (defined by date of test)
-
-```{r, eval=T}
-
-#Note data on new cases are missing
-#ds.mod.m <- melt(ds.mod[!is.na(ds.mod$new.cases),c('date','test','log.conc','new.cases')], id.vars=c('date', 'test'))
-ds.mod.m <- melt(ds.mod[,c('date','test','log.conc','pos.tests', 'cov.tests')], id.vars=c('date', 'test'))
-ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
-W2 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))]
-W2 <- apply(W2, 2, function(x){
- x[is.nan(x)] <-NA
- return(x)
-} )
-scale.W2 <- apply(W2, 2, scale)
-Y2 <- ds.mod.c[,'pos.tests_rep1_n1']
-Y2[is.nan(Y2)] <-NA
-
-log.offset2 <- log(ds.mod.c[,'cov.tests_rep1_n1'])
-log.offset2[is.nan(log.offset2)] <-NA
-log.offset2[is.na(log.offset2)] <-0
-W2.scale <- apply(W2,2, scale)
-
-```
-
-```{r, eval=T, include=F, eval=params$rerun_model}
-mods.cases.dx <- pblapply(c(0:7), function(zz) mod2.indiv.lag.func( Y=Y2, log.offset=log.offset2, W=W2.scale, lag.n=zz))
-saveRDS(mods.cases.dx,'./results/mods.cases.dx.rds')
-
-```
-Rate ratio (+/- 98.75% Credible) showing the relative increase in positive tests associated with a 1-log increase of sewage RNA concentration at various lags (based on date of test). This shows the strongest association with a lag of 4 days, with uncertainty in the estimates and in the length of the lag. From the plots, the sewage sludge RNA concentration preceded the increase in the proportion of tests that were positive.
-
-```{r}
-mods.cases.dx <-
- readRDS('./results/mods.cases.dx.rds')
-#Extract the betas
-beta.dx <- lapply(mods.cases.dx,'[[','beta1')
-beta.dx <- do.call('rbind.data.frame',beta.dx)
-irr.dx <- exp(beta.dx)
-
-yrange <- range(irr.dx)
-plot(0:7, irr.dx[,"post_means"], ylim=yrange, bty='l', ylab='Rate ratio', xlab='Lag (days)' , pch=16)
-arrows(x0=0:7, y0=irr.dx[,"lower"], y1=irr.dx[,"upper"], length=0)
-abline(h=1, lty=2, col='gray')
-```
-
-### Association of sewage viral concentration at different lags with new cases (defined by date of report)
-
-
-```{r, eval=T, include=F, eval=params$rerun_model}
-mods.cases.rep <- pblapply(c(0:7), function(zz) mod2.indiv.lag.func( Y=Y3, log.offset=log.offset3, W=W3.scale, lag.n=zz))
-saveRDS(mods.cases.rep,'./results/mods.cases.rep.rds')
-
-```
-
-Rate ratio (+/- 98.75% Credible intervals) showing the relative increase in reported cases associated with a 1-log increase of sewage RNA concentration at various lags. Sewage sludge RNA increases first, then reported cases lags 1 or 6 days later, with considerable uncertainty in the estimates and in the length of the lag. From the plots, the sewage sludge RNA concentration increase preceded the increase in reported cases
-```{r}
-mods.cases.rep <-
- readRDS('./results/mods.cases.rep.rds')
-
-#Extract the betas
-beta.rep <- lapply(mods.cases.rep,'[[','beta1')
-beta.rep <- do.call('rbind.data.frame',beta.rep)
-irr.rep <- exp(beta.rep)
-
-yrange <- range(irr.rep)
-plot(0:7, (irr.rep[,"post_means"]), ylim=yrange, bty='l', ylab='Rate ratio', xlab='Lag (days)' , pch=16)
-arrows(x0=0:7, y0=irr.rep[,"lower"], y1=irr.rep[,"upper"], length=0)
-abline(h=1, lty=2, col='gray')
-```
-
-The association with cases, based on date of report, is less clean than the associations based on other outcomes. For example, the time series of proportion of tests that is positive is implicitly adjusted for variations in testing over time. Publicly available data on reported cases, however, is commonly subject to day-of-week effects and other reporting quirks. In this plot, the number of reported cases is colored based on the day of the week
-
-```{r, eval=T}
-ds3$dow <- weekdays(ds3$date)
-plot(ds3$date,ds3$new.cases, col=as.factor(ds3$dow))
-points(ds3$date,ds3$new.cases, type='l')
-
-```
-
-
-
-
-