-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsewage through May.Rmd
624 lines (444 loc) · 26.8 KB
/
sewage through May.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
---
title: "Coronavirus in New Haven Sewage"
author: "Additional analyses for Peccia et al."
date: "6/1/2020"
output:
html_document:
toc: true
toc_depth: 3
toc_float: true
pdf_document: default
params:
rerun_model: FALSE
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(readxl)
library(reshape2)
library(HDInterval)
library(rjags)
library(xtable)
library(sas7bdat)
library(pbapply)
source('./Models/mod1.deriv.ar1.R')
source('./Models/mod2.dist.lag.alt2.R')
source('./Models/mod2.indiv.lags.R')
```
## Overview
These are additional analyses of data examining increases in SARS-CoV-2 RNA concentrations in primary sewage sludge in relation to COVID-19 hospitalizations and cases. The original preprint, led by Jordan Peccia, et al can be found here: https://www.medrxiv.org/content/10.1101/2020.05.19.20105999v1
The original pre-print included a correlation of smoothed versions of the viral RNA data and the epidemiological indicators. In the analyses presented here, we use an error-in-variables time series model to get an estimate of the underlying viral dynamics in sewage and the relationship of this with hospitalizations. This analysis is carried out in the Bayesian framework, allowing us to correctly quantify uncertainty in the estimated associations. These additional analyses were performed by Dan Weinberger (Epidemiology of Microbial Diseases, Yale School of Public Health), with input from Josh Warren (Biostatistics, Yale School of Public Health) and the rest of the original study team.
## Explanation of the data and analyses
We have up to 4 measurements of viral RNA in sewage sludge daily (2 targets, tested in duplicate). We also have the number of COVID-19 hospitalization per day in the catchment area.
We have five ways of measuring cases detected by laboratory test.
1a,b) Number of positive tests based on date when sample was collected; with or with adjustment for total number of tests performed
2a,b) Number of positive tests based on date when result was reported to DPH; with or with adjustment for total number of tests performed
3) Number of cases reported on the DPH website.
The number of cases reported on the DPH website is used for the main analyses here. This is because this is the relevant comparison when the goal is to identify a lead indicator to gain an understanding of the epidemic.cThe relationship among these different measure changes over the course of the epidemic as testing delays shrank. T
In the analyses, we assume that the observed sewage testing data (W) are drawn from an underlying, unobserved trajectory of viral concentration in the sewage, which follows an AR(1) process (X). We then evaluate the association between X and the number of hospitalization/cases/positive tests in Poisson regression models. This count model includes an AR(1) random effect for time. The models are fit using JAGS (see Models/mod2.indiv.lags.R). We test lags of the X variable of 0-7 days in a distributed lags framework. We also present results from each lag tested individually for comparison.
```{r, eval=F, include=F}
## Combine data (only need to do this once)
# ds1 <- read_excel('./Data/raw RNA data.xlsx')
# ds1.m <- melt(ds1, id.vars='DATE')
# names(ds1.m) <- c('date', 'test','conc')
ds1.m <- read_excel('./Data/data through June 1.xlsx')
names(ds1.m) <- c('date', 'conc', 'new.cases', 'test')
ds1.m$date <- as.Date(ds1.m$date)
ds2 <- read.sas7bdat('./Data/dw1stpostest2020_6_15.sas7bdat')
ds2$date <-format(as.Date(ds2$newspecdate, origin="1960-01-01"),"%Y-%m-%d")
ds2.nh <- ds2[ds2$OfficialCity %in% c('NEW HAVEN', 'EAST HAVEN','HAMDEN', 'WOODBRIDGE'),]
ds2.nh$pos.tests <-1
ds2.nh.agg <- aggregate(ds2.nh[,'pos.tests', drop=F],by=list('date'=ds2.nh$date), FUN=sum)
ds2.nh.agg$date <- as.Date(ds2.nh.agg$date)
tests.nh <- tests[tests$OfficialCity %in% c('NEW HAVEN', 'EAST HAVEN','HAMDEN', 'WOODBRIDGE'),]
tests.nh$cov.tests <-1
tests.nh.agg <- aggregate(tests.nh[,'cov.tests', drop=F],by=list('date'=tests.nh$date), FUN=sum)
tests.nh.agg$date <- as.Date(tests.nh.agg$date)
#Info on date of report and test from DPH
tests.report <-
read.sas7bdat('./Data/dw_1stcovtest_wdates.sas7bdat')
tests.report$DateReported <-
format(as.Date(tests.report$DateReported, origin="1960-01-01"),"%Y-%m-%d")
tests.report$newspecdate <-
format(as.Date(tests.report$newspecdate, origin="1960-01-01"),"%Y-%m-%d")
tests.report$test.time <-
as.numeric(as.Date(tests.report$DateReported) -
as.Date(tests.report$newspecdate))
test.nh <-
tests.report[tests.report$OfficialCity %in% c('NEW HAVEN', 'EAST HAVEN','HAMDEN', 'WOODBRIDGE'),]
test.nh$pos.tests <- 0
test.nh$pos.tests[test.nh$TestResult %in% c('DETECTED', 'POSITIVE')] <- 1
test.nh$tests <- 1
test.nh.agg.report <- aggregate(test.nh[,c('tests','pos.tests'), drop=F],by=list('date_reported'=test.nh$DateReported), FUN=sum)
test.nh.agg.report$date_reported <-
as.Date(test.nh.agg$date_reported)
names(test.nh.agg.report) <- c('date', 'N_tests_reported','N_positive_reported')
test.nh.agg.rec <- aggregate(test.nh[,c('tests','pos.tests'), drop=F],by=list('date_received'=test.nh$newspecdate), FUN=sum)
test.nh.agg.rec$date_received <-
as.Date(test.nh.agg.rec$date_received)
names(test.nh.agg.rec) <- c('date', 'N_tests_received','N_positive_recieved')
test.nh.combined <-
merge(test.nh.agg.rec,test.nh.agg.report, by='date', all=T)
test.nh.combined <- test.nh.combined[,-6]
test.nh.combined <-
test.nh.combined[test.nh.combined$date <= as.Date('2020-06-30'),]
test.nh.combined[,-1] <- apply(test.nh.combined[,-1],2, function(x){
x[is.na(x)] <- 0
return(x)
})
ds.hosp <- read_excel('./Data/hospitalization2020_07_08.xlsx')
ds.hosp <- as.data.frame(ds.hosp)
ds.hosp$date <- as.Date(ds.hosp[,1])
ds.hosp$hospt_admit <- ds.hosp$`# ADMISSIONS`
ds.hosp <- ds.hosp[,c('date','hospt_admit')]
#ds1.m$date <- as.Date(ds1.m$date)
#ds.cases <- read_excel('./Data/updated case data by reporting date.xlsx')
#names(ds.cases) <- c('date','new.cases')
#ds.cases$date <- as.Date(ds.cases$date)
ds3 <- merge(test.nh.combined, ds.hosp, by='date', all=T)
ds3 <- merge(ds3, ds1.m, by='date', all=T)
ds3$prop.pos_report <-
ds3$N_positive_reported/ds3$N_tests_reported
ds3$prop.pos_rec <-
ds3$N_positive_recieved/ds3$N_tests_received
ds3$conc <- as.numeric(ds3$conc)
ds3$target <- 'N1'
ds3$target[grep('N2', ds3$test)] <- 'N2'
write.csv(ds3,'./Data/combined.data.csv')
```
```{r}
## read in combined data
ds3 <- read.csv('./Data/combined.data.csv')
ds3$date <- as.Date(ds3$date)
```
```{r, eval=F}
#Compare "new.case' with N_positive_reported
plot(ds3$new.cases, ds3$N_positive_reported)
abline(a=0, b=1)
plot(ds3$date, ds3$N_positive_reported, col='black', type='l')
points(ds3$date, ds3$new.cases, col='red', type='l')
abline(a=0, b=1)
```
## Plots of raw data
This vertical lines are spaced 7 days apart. The increase in sewage is apparent earlier than the increase in the other indicators. This is particularly apparent on the log scale.
```{r, fig.width=8, fig.height=13}
par(mfrow=c(4,2))
plot(ds3$date, (ds3$conc), main='Viral RNA in Sewage', col=as.factor(ds3$target), xlab='Date', ylab=' Concentration', bty='l')
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))
plot(ds3$date, log(ds3$conc), main='Viral RNA in Sewage', col=as.factor(ds3$target), xlab='Date', ylab='Log Concentration', bty='l')
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))
#Hospitalization
plot(ds3$date, (ds3$hospt_admit), type='l', bty='l', main='Hospitalizations', ylab=('Hospitalizations (N)'))
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))
plot(ds3$date, log(ds3$hospt_admit), type='l', bty='l', main='Log Hospitalizations', ylab=('log(Hospitalizations (N))') )
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))
# Cases based on date of report
# plot(ds3$date, ds3$new.cases, type='l', main='New reported cases')
plot(ds3$date, (ds3$N_positive_reported), main='New cases, based on report date', xlab='Date', ylab='Reported cases', bty='l', type='l')
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))
plot(ds3$date, log((ds3$N_positive_reported+0.5)), main='log(New cases, based on report date)', type='l', xlab='Date', ylab='Log Reported cases', bty='l')
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))
#Pos tests ased on date of sample collection
plot(ds3$date, ds3$N_positive_recieved, main='New cases, based on sample receipt date', xlab='Date', ylab='Cases', bty='l' , type='l')
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))
plot(ds3$date, log(ds3$N_positive_recieved+0.5), main='Log(New cases, based on sample receipt date)', xlab='Date', ylab='Log(Reported Cases)', bty='l', type='l')
polygon( as.Date(c( c('2020-03-17','2020-03-24'), c('2020-03-24','2020-03-17'))), c(c(-1000,-1000 ),c(1000,1000)),col = rgb(0, 0, 1, alpha = 0.1), border = NA)
abline(v=seq.Date(from=as.Date('2020-03-17'),length.out=15, by='week') , lty=2, col=rgb(0,0,0,alpha=0.1))
```
Comparing the new cases of COVID-19 based on sample receipt date or date of report, we can see a long lag early in the epidemic between sample collection date and date of report to DPH. This gap seems to narrow as the epidemic progresses, perhaps due to improvements in testing
```{r}
plot(ds3$date, (ds3$N_positive_reported), main='Sample receipt date (red) vs report date (black)', xlab='Date', ylab='(Reported cases)', bty='l', type='l')
points(ds3$date, (ds3$N_positive_recieved), col='red', type='l')
```
Comparing the new cases of COVID-19 based on date of report to DPH (black) and when the data were reported on the DPH website (red).
```{r}
plot(ds3$date, (ds3$N_positive_reported), main='Web report date (red) vs report date to DPH (black)', xlab='Date', ylab='(Reported cases)', bty='l', type='l')
points(ds3$date, (ds3$new.cases), col='red', type='l')
```
```{r}
## Prepare data for JAGS
ds.mod <- ds3[!(is.nan(ds3$conc)),]
ds.mod <- ds.mod[!(is.na(ds.mod$conc)),]
ds.mod$log.conc <- as.vector(log(ds.mod$conc))
ds.mod.m <- melt(ds.mod[,c('date','test','log.conc','hospt_admit')], id.vars=c('date', 'test'))
ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
W1 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))]
W1 <- apply(W1, 2, function(x){
x[is.nan(x)] <-NA
return(x)
} )
scale.W1 <- apply(W1,2, scale)
Y1 <- ds.mod.c[,'hospt_admit_R1, N1']
Y1[is.nan(Y1)] <-NA
W1.scale <- apply(W1,2,scale)
log.offset.y1 <- rep(0, length(Y1))
```
## Association between sewage concentration and epidemiological indicators (Distributed lags)
We can model the epidemiological time series as a distributed lag of the sewage data.
-We evaluate three epidemiological time series: COVID-19 hospitalizations, number of cases of covid-19 (based on date of sample receipt), and number of cases of covid-19 (based on date of report).
The plots of the distributed lag model show 90% credible intervals
```{r,eval=params$rerun_model, include=F}
mod2 <- mod2.func(W=W1.scale, Y=Y1,log.offset=log.offset.y1)
saveRDS(mod2,'./results/dl.mod2.rds')
```
### Distributed lags for the association between viral RNA in sewage and COVID hospitalizations.
This shows that lags of 1-4 days are associated with hospitalization. +/-90% Credible Intervals. From the plots, the sewage sludge RNA concentration increase preceded the increase in hospital admissions.
```{r}
mod2 <- readRDS('./results/dl.mod2.rds')
mod2.hosp.beta <- mod2[c('beta1', 'beta1.cum')]
saveRDS(mod2.hosp.beta,'./results/dl.betas.hosp.rds')
yrange <- range(mod2$beta1)
plot(-1:8, mod2$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
arrows(x0=-1:8, y0=mod2$beta1$lower, y1=mod2$beta1$upper, length=0)
abline(h=0)
```
Cumulative relationship between viral RNA in sewage and hospital admissions. This shows that lags of sewage data of 1-4 days together best correlate with the hospitalization time series (after 4 days, cumulative beta does not increase). +/-90% Credible Intervals.
```{r}
yrange <- range(mod2$beta1.cum)
plot(-1:8, mod2$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
arrows(x0=-1:8, y0=mod2$beta1.cum$lower, y1=mod2$beta1.cum$upper, length=0)
abline(h=0)
```
### Distributed lags for the association between viral RNA in sewage and new reported COVID-19 cases, based on reporting date on the CT DPH website
```{r, eval=T}
#Note data on new cases are missing
ds.mod.m <- melt(ds.mod[,c('date','test','log.conc','new.cases')], id.vars=c('date', 'test'))
ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
W3 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))]
W3 <- apply(W3, 2, function(x){
x[is.nan(x)] <-NA
return(x)
} )
Y3 <- ds.mod.c[,'new.cases_R1, N1']
Y3[is.nan(Y3)] <-NA
#Y3[is.na(Y3)] <- 0
log.offset3.null <- rep(0, length(Y3))
W3.scale <- apply(W3,2, scale)
```
```{r, eval=params$rerun_model, include=F}
mod3 <- mod2.func(W=W3.scale, log.offset=log.offset3.null,Y=Y3)
saveRDS(mod3,'./results/dl.mod3.cases.web.rds')
```
This shows that lags of sewage data at longer time lags (6-8 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals. Note that the lag between reported cases and the sewage data shrinks over time (see time series plots) as testing improves.
```{r, eval=T}
mod3 <- readRDS('./results/dl.mod3.cases.web.rds')
mod2.case.web.beta <- mod3[c('beta1', 'beta1.cum')]
saveRDS(mod2.case.web.beta,'./results/dl.betas.case.web.rds')
yrange <- range(mod3$beta1)
plot(-1:8, mod3$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
arrows(x0=-1:8, y0=mod3$beta1$lower, y1=mod3$beta1$upper, length=0)
abline(h=0)
```
Cumulative relationship between viral RNA in sewage and hospital admissions. This shows that lags of sewage data of 6-8 days together best correlate with the hospitalization time series (after 4 days, cumulative beta does not increase). +/-90% Credible Intervals.
```{r}
yrange <- range(mod3$beta1.cum)
plot(-1:8, mod3$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
arrows(x0=-1:8, y0=mod3$beta1.cum$lower, y1=mod3$beta1.cum$upper, length=0)
abline(h=0)
```
### Distributed lags for the association between viral RNA in sewage and new COVID-19 cases, based on date of report to DPH
```{r, eval=T}
#Note data on new cases are missing
ds.mod.m <- melt(ds.mod[,c('date','test','log.conc','N_tests_reported','N_positive_reported')], id.vars=c('date', 'test'))
ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
W3 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))]
W3 <- apply(W3, 2, function(x){
x[is.nan(x)] <-NA
return(x)
} )
Y3 <- ds.mod.c[,'N_positive_reported_R1, N1']
Y3[is.nan(Y3)] <-NA
#Y3[is.na(Y3)] <- 0
log.offset3 <- log(ds.mod.c[,'N_tests_reported_R1, N1'])
log.offset3[is.nan(log.offset3)] <-0
log.offset3[is.na(log.offset3)] <-0
log.offset3.null <- rep(0, length(log.offset3))
W3.scale <- apply(W3,2, scale)
```
```{r, eval=params$rerun_model, include=F}
mod3 <- mod2.func(W=W3.scale, log.offset=log.offset3,Y=Y3)
saveRDS(mod3,'./results/dl.mod3.cases.rds')
```
This shows that lags of sewage data at longer time lags (0-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals. Note that the lag between reported cases and the sewage data shrinks over time (see time series plots) as testing improves.
```{r, eval=T}
mod3 <- readRDS('./results/dl.mod3.cases.rds')
mod3.betas <- mod3[c('beta1','beta1.cum')]
saveRDS(mod3.betas,'./results/dl.mod3.betas.report.to.dph.denom.rds')
plot(-1:8, mod3$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
yrange <- range(mod3$beta1)
arrows(x0=-1:8, y0=mod3$beta1$lower, y1=mod3$beta1$upper, length=0)
abline(h=0)
```
Cumulative relationship between viral RNA in sewage and new cases. This shows that lags of sewage data at longer time lags (0-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals.
```{r, eval=T}
yrange <- range(mod3$beta1.cum)
plot(-1:8, mod3$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
arrows(x0=-1:8, y0=mod3$beta1.cum$lower, y1=mod3$beta1.cum$upper, length=0)
abline(h=0)
```
We can also repeat this analysis without adjusting for testing volume
```{r, eval=params$rerun_model, include=F}
mod3.no.offset <-
mod2.func(W=W3.scale, log.offset=log.offset3.null,Y=Y3)
saveRDS(mod3.no.offset,'./results/dl.mod3.cases.no.offset.rds')
```
This shows largely the same pattern as when adjusting for testing volume. (1-4 day lags) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals.
```{r, eval=T}
mod3.no.offset <-
readRDS('./results/dl.mod3.cases.no.offset.rds')
mod3.betas.no.offset <- mod3.no.offset[c('beta1','beta1.cum')]
saveRDS(mod3.betas.no.offset,'./results/dl.mod3.betas.report.to.dph.no.offset.rds')
yrange <- range(mod3.no.offset$beta1)
plot(-1:8, mod3.no.offset$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
arrows(x0=-1:8, y0=mod3.no.offset$beta1$lower, y1=mod3.no.offset$beta1$upper, length=0)
abline(h=0)
```
Cumulative relationship between viral RNA in sewage and new cases (not adjusting for testing volume). This shows that lags of sewage data at longer time lags (1-4 days) best correlate with the time series of reported cases, with considerable uncertainty in the estimates. +/-90% Credible Intervals.
```{r, eval=T}
yrange <- range(mod3.no.offset$beta1.cum)
plot(-1:8, mod3.no.offset$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
arrows(x0=-1:8, y0=mod3.no.offset$beta1.cum$lower, y1=mod3.no.offset$beta1.cum$upper, length=0)
abline(h=0)
```
### Distributed lags for the association between viral RNA in sewage and number or percent positive, based on date when the sample was collected.
```{r, eval=T}
#Note data on new cases are missing
#ds.mod.m <- melt(ds.mod[!is.na(ds.mod$new.cases),c('date','test','log.conc','new.cases')], id.vars=c('date', 'test'))
ds.mod.m <- melt(ds.mod[,c('date','test','log.conc',"N_positive_recieved", "N_tests_received" )], id.vars=c('date', 'test'))
ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
W2 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))]
W2 <- apply(W2, 2, function(x){
x[is.nan(x)] <-NA
return(x)
} )
scale.W2 <- apply(W2, 2, scale)
Y2 <- ds.mod.c[,"N_positive_recieved_R1, N1"]
Y2[is.nan(Y2)] <-NA
log.offset2 <- log(ds.mod.c[,"N_tests_received_R1, N1"])
log.offset2[is.nan(log.offset2)] <-NA
log.offset2[is.na(log.offset2)] <-0
#log.offset2 <- rep(0,length(Y2))
log.offset2.null <- rep(0, length(log.offset2))
W2.scale <- apply(W2,2, scale)
```
```{r, eval=params$rerun_model, include=F}
mod2 <- mod2.func(W=W2.scale, Y=Y2, log.offset=log.offset2)
saveRDS(mod2, './results/dl.mod2.pct.pos.rds')
mod2.no.offset <- mod2.func(W=W2.scale, Y=Y2, log.offset=log.offset2.null)
saveRDS(mod2.no.offset, './results/dl.mod2.N.pos.specdate.rds')
```
This shows that lags of 0-2 days are associated with the case time series, based on date of test, (on the cumulative plot, beta does not increase after 2 days) +/-90% Credible Intervals. From the plots, the sewage sludge RNA concentration increase preceded the increase in number of cases, based on sample collection date.
```{r, eval=T}
mod2 <- readRDS('./results/dl.mod2.pct.pos.rds')
mod2.betas <- mod2[c('beta1', 'beta1.cum')]
saveRDS(mod2.betas,'./results/dl.mod2.betas.sample.date.pct.post.rds')
yrange <- range(mod2$beta1)
plot(-1:8, mod2$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
arrows(x0=-1:8, y0=mod2$beta1$lower, y1=mod2$beta1$upper, length=0)
abline(h=0)
```
Cumulative relationship between viral RNA in sewage and percent positive based on sample collection date. +/-90% Credible Intervals.
```{r, eval=T}
yrange <- range(mod2$beta1.cum)
plot(-1:8, mod2$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
arrows(x0=-1:8, y0=mod2$beta1.cum$lower, y1=mod2$beta1.cum$upper, length=0)
abline(h=0)
```
and same thing with no offset term
```{r, eval=T}
mod2.no.offset <- readRDS('./results/dl.mod2.N.pos.specdate.rds')
mod2.betas.no.offset <- mod2.no.offset[c('beta1', 'beta1.cum')]
saveRDS(mod2.betas.no.offset,'./results/dl.mod2.betas.sample.date.N.pos.rds')
yrange <- range(mod2.no.offset$beta1)
plot(-1:8, mod2.no.offset$beta1$post_means, ylim=yrange, bty='l', ylab='Beta', xlab='Lag (days)' , pch=16)
arrows(x0=-1:8, y0=mod2.no.offset$beta1$lower, y1=mod2.no.offset$beta1$upper, length=0)
abline(h=0)
```
Cumulative relationship between viral RNA in sewage and number of cases based on sample collection date. +/-90% Credible Intervals.
```{r, eval=T}
yrange <- range(mod2.no.offset$beta1.cum)
plot(-1:8, mod2.no.offset$beta1.cum$post_means, ylim=yrange, bty='l', ylab='Cumulative beta', xlab='Lag (days)', pch=16 )
arrows(x0=-1:8, y0=mod2.no.offset$beta1.cum$lower, y1=mod2.no.offset$beta1.cum$upper, length=0)
abline(h=0)
```
## Association between sewage concentration and epidemiological indicators (Individual lags)
-All of the plots of individual lags show 98.75% credible intervals, which represent an alpha of 0.1, adjusted for multiple testing.
### Association of sewage viral concentration at different lags with hospitalizations
Test lags of 1-7 days, and leads of 1 days one at a time.
```{r, include=F, eval=params$rerun_model}
mods.hosp <- pblapply(c(-1:7), function(zz) mod2.indiv.lag.func( Y=Y1, log.offset=log.offset.y1, W=W1.scale, lag.n=zz))
saveRDS(mods.hosp,'./results/mods.hosp.rds')
```
Rate ratio (+/- 98.75% Credible intervals) showing the relative increase in positive tests associated with a 1-log increase of sewage RNA concentration at various lags (based on date of test). This shows the strongest association with a lag of 1 day, with the effect trailing off with longer or shorter lags. From the plots, the sewage sludge RNA concentration increase preceded the increase in hospital admissions.
```{r}
mods.hosp <-
readRDS('./results/mods.hosp.rds')
#Extract the betas
beta.hospt <- lapply(mods.hosp,'[[','beta1')
beta.hospt <- do.call('rbind.data.frame',beta.hospt)
irr.hospt <- exp(beta.hospt)
yrange <- range(irr.hospt)
plot(-1:7, irr.hospt[,"post_means"], ylim=yrange, bty='l', ylab='Rate ratio', xlab='Lag (days)' , pch=16)
arrows(x0=-1:7, y0=irr.hospt[,"lower"], y1=irr.hospt[,"upper"], length=0)
abline(h=1, lty=2, col=rgb(0,0,0,alpha=0.1))
```
### Association of sewage viral concentration at different lags with new cases (defined by date of test)
```{r, eval=T}
#Note data on new cases are missing
#ds.mod.m <- melt(ds.mod[!is.na(ds.mod$new.cases),c('date','test','log.conc','new.cases')], id.vars=c('date', 'test'))
ds.mod.m <- melt(ds.mod[,c('date','test','log.conc',"N_positive_recieved", "N_tests_received" )], id.vars=c('date', 'test'))
ds.mod.c <- dcast(ds.mod.m, date~variable+test, fun.aggregate = mean)
W2 <- ds.mod.c[, grep('log.conc', names(ds.mod.c))]
W2 <- apply(W2, 2, function(x){
x[is.nan(x)] <-NA
return(x)
} )
scale.W2 <- apply(W2, 2, scale)
Y2 <- ds.mod.c[,"N_positive_recieved_R1, N1"]
Y2[is.nan(Y2)] <-NA
log.offset2 <- log(ds.mod.c[,"N_tests_received_R1, N1"])
log.offset2[is.nan(log.offset2)] <-NA
log.offset2[is.na(log.offset2)] <-0
W2.scale <- apply(W2,2, scale)
```
```{r, eval=T, include=F, eval=params$rerun_model}
mods.cases.dx <- pblapply(c(-1:7), function(zz) mod2.indiv.lag.func( Y=Y2, log.offset=log.offset2, W=W2.scale, lag.n=zz))
saveRDS(mods.cases.dx,'./results/mods.cases.dx.rds')
```
Rate ratio (+/- 98.75% Credible) showing the relative increase in positive tests associated with a 1-log increase of sewage RNA concentration at various lags (based on date of test). This shows the strongest association with a lag of 4 days, with uncertainty in the estimates and in the length of the lag. From the plots, the sewage sludge RNA concentration slightly preceded the increase in the cases (based on sample collection date).
```{r}
mods.cases.dx <-
readRDS('./results/mods.cases.dx.rds')
#Extract the betas
beta.dx <- lapply(mods.cases.dx,'[[','beta1')
beta.dx <- do.call('rbind.data.frame',beta.dx)
irr.dx <- exp(beta.dx)
yrange <- range(irr.dx)
plot(-1:7, irr.dx[,"post_means"], ylim=yrange, bty='l', ylab='Rate ratio', xlab='Lag (days)' , pch=16)
arrows(x0=-1:7, y0=irr.dx[,"lower"], y1=irr.dx[,"upper"], length=0)
abline(h=1, lty=2, col='gray')
```
### Association of sewage viral concentration at different lags with new cases (defined by date of report)
```{r, eval=T, include=F, eval=params$rerun_model}
mods.cases.rep <- pblapply(c(-1:7), function(zz) mod2.indiv.lag.func( Y=Y3, log.offset=log.offset3, W=W3.scale, lag.n=zz))
saveRDS(mods.cases.rep,'./results/mods.cases.rep.rds')
```
Rate ratio (+/- 98.75% Credible intervals) showing the relative increase in reported cases associated with a 1-log increase of sewage RNA concentration at various lags. Sewage sludge RNA increases first, then reported cases lags 1 or 6 days later, with considerable uncertainty in the estimates and in the length of the lag. From the plots, the sewage sludge RNA concentration increase preceded the increase in reported cases
```{r}
mods.cases.rep <-
readRDS('./results/mods.cases.rep.rds')
#Extract the betas
beta.rep <- lapply(mods.cases.rep,'[[','beta1')
beta.rep <- do.call('rbind.data.frame',beta.rep)
irr.rep <- exp(beta.rep)
yrange <- range(irr.rep)
plot(-1:7, (irr.rep[,"post_means"]), ylim=yrange, bty='l', ylab='Rate ratio', xlab='Lag (days)' , pch=16)
arrows(x0=-1:7, y0=irr.rep[,"lower"], y1=irr.rep[,"upper"], length=0)
abline(h=1, lty=2, col='gray')
```