-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcode.Rmd
602 lines (437 loc) · 17.9 KB
/
code.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
% Analysis code
## Introduction
The data were obtained from a series of randomized replicated experiments. There is one single `csv` file that contains data from each experiment. All analyses are presented below and separated for each experiment. I follow the steps (not in that exactly order) ilustrated in the R for Data Science book (http://r4ds.had.co.nz/) (Grolemund and Wickham 2017). These are: data import and tidy, explore (transform, visualize and model) and communicate.
These are the packages used.
```{r message=FALSE, warning=FALSE, paged.print=FALSE}
library(tidyverse)
library(here)
library(viridis)
library(ggthemes)
library(lme4)
library(cowplot)
library(broom)
library(agricolae)
library(lattice)
library(emmeans)
library(car)
library(scales)
```
## Perithecia production
### Import
Let's have a look at the raw data.
```{r message=FALSE, warning=FALSE}
perithecia <- read_csv(here("data", "perithecia.csv")) %>%
filter(genotype != "t")
perithecia
```
The main response variable is ppi (perithecia production index), which was already calculated (it is an index based on the frequency of the scores). However, the original scores are included and consist of the number of grains on each score (0 to 3).
### Transform
The frequecy of each score is shown in separate columns. To plot and further work with the data, we will need to format the data from the wide to the long format, where each frequency of the scores need to be in a single colum. We will use the `gather` function of `dplyr`. For this, we need to inform the name of columns for the score and the frequency.
```{r}
perithecia2 <- perithecia %>%
gather("score", "frequency", 8:11)
perithecia2
```
### Visualize
We now can visualize the frequency of the scores for each species-genotype separated (facet) for each substrate. This is a nice plot to have in the paper!
```{r}
g1 <- perithecia2 %>%
ggplot(aes(reorder(genotype, -ppi), frequency, fill = score)) +
geom_bar(stat = "identity", position = "fill") +
facet_wrap(~substrate) +
theme_few() + # load ggthemes
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(x = "", y = "Proportion of grains", fill = "Score") +
scale_fill_viridis(discrete = TRUE)
g1
```
### Model
Since we have one species with two genotypes, it may be important, besides testing the effect of species or genotype in the model, to create a species_genotype variable. We will then fit a mixed model to test the effect species_genotype on the perithecial production index (ppi). In this way the two genotypes within one species can be compared between them and with the others. We use the `unite` function.
```{r}
# creating the variable
perithecia <- perithecia %>%
unite(species_genotype, species, genotype, sep = "_", remove = F)
```
Now we fit the mixed model using `lmer` function. We treat `Isolate` as a random effects.
```{r message=FALSE, warning=FALSE, paged.print=FALSE}
library(lme4)
lmm_per <- lmer(
ppi ~ species_genotype * substrate + (1 | Isolate),
data = perithecia, REML = FALSE
)
```
Let's check the single and interaction effects using `Anova` function of the `car` package.
```{r message=FALSE, warning=FALSE}
library(car)
Anova(lmm_per)
```
Evaluate the model
```{r}
plot(lmm_per, type = c("p", "smooth"), col = "black")
qqmath(lmm_per, id = 0.05, col = "black")
```
#### Means comparison
Not too bad. Let's now compare the means of treatments and create a dataframe which will be use to further create a plot with the estimated means and confidence interval. We will use the `emmeans` package which is an update for the old `lsmeans` package. The syntax is the same as before.
```{r message=FALSE, warning=FALSE}
library(emmeans)
medias <- emmeans(lmm_per, ~ species_genotype * substrate)
med <- cld(medias, Letters = LETTERS, alpha = .05)
# make output as a dataframe
med <- data.frame(med)
head(med)
```
### Figures
This new `med` dataframe does not contain the two original variables we want to use to plot the data. Let's separate the `species_genotype` variable and create `genotype` and `species`. We use the `separate` function.
```{r}
med2 <- med %>%
separate(species_genotype, c("species", "genotype"), sep = "_", extra = "merge")
# extra argument was used to keep the genotype information alltogether
head(med2)
```
We now can plot the point estimates for the means and respective 95%CI of the `ppi` for each `genotype`. We add `species` legends when using `color` argument so we can have five species_genotype.
```{r message=FALSE, warning=FALSE}
g2 <- med2 %>%
ggplot(aes(reorder(genotype, -emmean), emmean, color = species)) +
geom_point(position = position_dodge(width = 0.3)) +
theme_few() +
theme(legend.position = "right", legend.text = element_text(size = 8, face = "italic"), axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_color_viridis(discrete = TRUE) +
facet_wrap(~substrate) +
geom_errorbar(
aes(ymin = lower.CL, ymax = upper.CL),
width = 0.2, position = position_dodge(width = 0.3)
) +
labs(y = "Perithecia production index (%)", x = "Trichothecene genotype")
g2
```
Finally, we prepare and save the Figure 1 of the manuscript which is a combo figure with the the two previous plots for both the original scores and the normalized index. We use the `plot_grid` function of the `cowplot` package for this.
```{r message=FALSE, warning=FALSE}
grid1 <- plot_grid(g1, g2, labels = c("A", "B"), align = "hv", ncol = 1)
ggsave(here("figs", "figure1.png"), grid1, width = 6, height = 7)
grid1
```
## Mycelial growth
### Import
The main variable is mycelia growth rate, or the ratio of the growth at the fifth day and number of days. Besides the species and genotype factors, there are two levels of temperatures being tested.
```{r message=FALSE, warning=FALSE}
mgr <- read_csv(here("data", "mycelia.csv"))
head(mgr)
```
### Visualize
We can plot means of mycelial growth rate for each species-genotype and temperature as facetting variable. This plot will be included in the publication.
```{r message=FALSE, warning=FALSE}
mgr1 <- mgr %>%
group_by(isolate, genotype, species, temperature) %>%
summarize(mean_mgr = mean(mgr)) %>%
ggplot(aes(factor(genotype), mean_mgr, color = species)) +
geom_jitter(width = 0.1, size = 3) +
facet_wrap(~temperature) +
scale_color_viridis(discrete = TRUE) +
theme_few() +
theme(
legend.position = "none",
legend.text = element_text(size = 7, face = "italic")
) +
labs(
x = "Trichothecene Genotype",
y = "MGR (mm/day)", color = "Species"
) +
ylim(0.2, 1.8)
```
### Model
Let's fit a mixed model for the `mgr` data. We first test the effect of the interaction.
```{r}
mgr <- mgr %>%
unite(species_genotype, species, genotype, sep = "_", remove = F)
lmer_mgr <- lmer(mgr ~ species_genotype * temperature + (1 | species_genotype / isolate), data = mgr, REML = FALSE)
Anova(lmer_mgr)
```
The interaction was significant. We now create different datasets for each temperature and test the effect of `species_genotype` within each temperature.
```{r}
# 15 C
mgr15 <- mgr %>%
filter(temperature == "15")
lmer_mgr15 <- lmer(mgr ~ species_genotype + (1 | species_genotype / isolate), data = mgr15, REML = FALSE)
Anova(lmer_mgr15)
# 30 C
mgr30 <- mgr %>%
filter(temperature == "30")
lmer_mgr30 <- lmer(mgr ~ species_genotype + (1 | species_genotype / isolate), data = mgr30, REML = FALSE)
Anova(lmer_mgr30)
```
We can see that there are no effect of `species_genotype` in the `mgr` within each of the temperatures.
## Sporulation & Germination
In this experiment, the production of macroconidia and further germination of 20 randomly selected spores of the different strains of the species-genotype combination, was evaluated in two runs of the experiment.
### Import
```{r message=FALSE, warning=FALSE}
spor <- read_csv(here("data", "spor_germ.csv"))
# again, we create the species_genotype variable
spor <- spor %>%
unite(species_genotype, species, genotype, sep = "_", remove = F)
head(spor)
```
### Visualize
The code below will produce plots for total macroconidia production and percent germinated spores.
```{r message=FALSE, warning=FALSE}
spor1 <- spor %>%
group_by(species, genotype, isolate) %>%
summarize(mean_spor = mean(spores_ml))
spor2 <- spor1 %>%
ggplot(aes(genotype, mean_spor, color = species)) +
geom_jitter(width = 0.1, size = 3) +
theme_few() +
scale_color_viridis(discrete = TRUE) +
theme(legend.position = c(1, 5), axis.text.x = element_text(angle = 0, hjust = 0.5, size = 7), legend.text = element_text(size = 12, face = "italic"), axis.title = element_text(size = 12)) +
labs(x = "", y = "N. of macroconidia (x10.000)", color = "Species") +
ylim(0, 5) +
guides(colour = guide_legend(nrow = 2))
```
```{r message=FALSE, warning=FALSE}
germ1 <- spor %>%
group_by(species, genotype, isolate) %>%
summarize(mean_germ = mean(germp))
germ <- germ1 %>%
ggplot(aes(genotype, mean_germ, color = species)) +
geom_jitter(width = 0.1, size = 3) +
theme_few() +
scale_color_viridis(discrete = TRUE) +
theme(legend.position = "none", axis.text.x = element_text(angle = 0, hjust = 0.5, size = 7), legend.text = element_text(size = 12, face = "italic"), axis.title = element_text(size = 12)) +
labs(x = "", y = "Germination rate (%)", color = "Species") +
guides(colour = guide_legend(nrow = 2))
```
### Model
Mixed model for testing the effect of genotype.
```{r message=FALSE, warning=FALSE}
lmer_spor <- lmer(spores_ml ~ genotype + (1 | genotype / isolate), data = spor, REML = FALSE)
```
```{r}
plot(lmer_spor, type = c("p", "smooth"), col = "black")
qqmath(lmer_spor, id = 0.05, col = "black")
```
Model fitted well the data. Let's proceed.
```{r}
Anova(lmer_spor)
```
#### Means comparison
```{r}
medias <- emmeans(lmer_spor, ~genotype)
med <- cld(medias, Letters = LETTERS, alpha = .05)
med <- data.frame(med)
med
```
### Figure 2
We will produce a combo figure for the three variables.
```{r message=FALSE, warning=FALSE}
grid1 <- plot_grid(spor2, germ, labels = c("B", "C"), ncol = 2, align = "hv")
grid5 <- plot_grid(mgr1, grid1, labels = c("A"), rel_heights = c(1, 1), ncol = 1, align = "hv")
ggsave(here("figs", "figure2.png"), grid5, width = 6, height = 6.2)
grid5
```
## Pathogenicity
The pathogenicity was assessed based on the progress of the symptoms from the central (inoculated) spikelet. The data was transformed to percent severity of the spike (proportion of diseased spikelets). There is one obsevation of severity for each day after inoculation (dai) and individual spike.
### Import
```{r message=FALSE, warning=FALSE}
pathogen <- read_csv(here("data", "pathogenicity.csv")) %>%
filter(isolate != "test") %>%
select (-c(8:12))
pathogen
```
### Transform
Since we have spikes as replicates, we will calculate the mean and standard deviation of severity for plotting purposes.
```{r}
# preparing cultivar 194
pathogen_194 <- pathogen %>%
filter(cultivar == "BRS 194") %>%
group_by(cultivar, dai, genotype, species) %>%
summarize(
mean_sev = mean(sev),
sd_sev = sd(sev)
)
# preparing Guamirim
pathogen_Gua <- pathogen %>%
filter(cultivar == "BRS Guamirim") %>%
# filter(exp == 2) %>%
group_by(cultivar, dai, genotype, species) %>%
summarize(
mean_sev = mean(sev),
sd_sev = sd(sev)
)
```
### Visualize temporal
```{r message=FALSE, warning=FALSE}
brs194 <- pathogen_194 %>%
group_by(dai, genotype, species) %>%
ggplot(aes(dai, mean_sev, color = species, shape = genotype, group = interaction(species, genotype))) +
geom_point(position = position_dodge(width = 0.9)) +
geom_line() +
geom_errorbar(
aes(min = mean_sev - sd_sev, max = mean_sev + sd_sev),
width = 0.2, alpha = 0.3, position = position_dodge(width = 0.9)
) +
theme_few() +
ylim(0, 55) +
xlim(0, 25) +
theme(legend.position = "right", legend.text = element_text(size = 9, face = "italic")) +
scale_color_viridis(discrete = TRUE) +
labs(
y = "FHB severity (%) ", x = "Days after inoculation",
color = "species"
)
brsgua <- pathogen_Gua %>%
group_by(dai, genotype, species) %>%
ggplot(aes(dai, mean_sev, color = species, shape = genotype, group = interaction(species, genotype))) +
geom_point(position = position_dodge(width = 0.9)) +
geom_line() +
geom_errorbar(
aes(min = mean_sev - sd_sev, max = mean_sev + sd_sev),
width = 0.2, alpha = 0.3, position = position_dodge(width = 0.9)
) +
theme_few() +
theme(legend.position = "none") +
scale_color_viridis(discrete = TRUE) +
ylim(0, 55) +
xlim(0, 25) +
labs(
y = "FHB severity (%) ", x = "Days after inoculation",
color = "species"
)
fig3 <- plot_grid(brsgua, brs194, labels = c("A", "B"), rel_widths = c(0.7, 1), ncol = 2, align = "hv")
ggsave(here("figs", "figure3.png"), fig3, width = 9, height = 4)
fig3
```
### Visualize AUDPC
Let's calculate the area under the curve for each isolate. For this, we will use the `audpc` function of the `agricolae` package. First, we need to group several variables up to `dai` (days after inoculation). To make our life easier, lets use the `do` in conjuction with the `tidy` functions of the `dplyr` and `broom` package, respectively. This will calculate the audpc for each spike.
#### BRS 194
```{r}
audpc_194 <- pathogen %>%
filter(cultivar == "BRS 194") %>%
# filter(exp == 2) %>%
unite(species_genotype, species, genotype, sep = "_", remove = F) %>%
select(
exp, cultivar, dai, isolate, species_genotype, species, genotype, spike,
sev
) %>%
group_by(exp, isolate, species_genotype, species, genotype, spike, dai) %>%
summarize(mean_sev = mean(sev)) %>%
do(tidy(audpc(.$mean_sev, .$dai)))
names(audpc_194)[8] <- "audpc"
```
Now we can plot the means and standard deviation for each isolate.
```{r}
audpc_194 %>%
group_by(isolate, species, genotype) %>%
summarize(
mean_audpc = mean(audpc),
sd_audpc = sd(audpc)
) %>%
ggplot(aes(reorder(isolate, mean_audpc), mean_audpc, color = genotype, shape = species)) +
coord_flip() +
theme_few() +
geom_point(position = position_dodge(width = 0.4)) +
geom_errorbar(aes(min = mean_audpc - sd_audpc, max = mean_audpc + sd_audpc), position = position_dodge(width = 0.4), width = 0.2)
```
#### BRS Guamirim
```{r}
audpc_gua <- pathogen %>%
filter(cultivar == "BRS Guamirim") %>%
# filter(exp == 2) %>%
unite(species_genotype, species, genotype, sep = "_", remove = F) %>%
select(
exp, cultivar, dai, isolate, species_genotype, species, genotype, spike,
sev
) %>%
group_by(exp, isolate, species_genotype, species, genotype, spike, dai) %>%
summarize(mean_sev = mean(sev)) %>%
do(tidy(audpc(.$mean_sev, .$dai)))
names(audpc_gua)[8] <- "audpc"
```
Now we can plot the means and standard deviation for each isolate.
```{r}
audpc_gua %>%
group_by(isolate, species, genotype) %>%
summarize(
mean_audpc = mean(audpc),
sd_audpc = sd(audpc)
) %>%
ggplot(aes(reorder(isolate, mean_audpc), mean_audpc, color = genotype, shape = species)) +
coord_flip() +
theme_few() +
geom_point(position = position_dodge(width = 0.4)) +
geom_errorbar(aes(min = mean_audpc - sd_audpc, max = mean_audpc + sd_audpc), position = position_dodge(width = 0.4), width = 0.2)
```
### Mixed model
We fit a separate model for each cultivar because the experiments were conducted at different times. We fit a multilevel to test the effect of specis_genotype as fixed effects and spikes nested within isolates as random effects. The species_genotype interaction was non-significant (not shown), and then a simpler model is used.
#### BRS Guamirim
```{r}
mix_gua <- lmer(
audpc ~ species_genotype +
(1 | isolate/spike),
data = audpc_gua, REML = FALSE)
mix_gua
```
Evaluate the model
```{r message=FALSE, warning=FALSE}
plot(mix_gua, type = c("p", "smooth"), col = "black")
qqmath(mix_gua, id = 0.05, col = "black")
```
The model seems OK. Let's check the significance of the species_genotype and estimated means.
```{r}
Anova(mix_gua)
means <- emmeans(mix_gua, ~ species_genotype)
cld(means)
```
We can see that the variation among isolates was too large and so no difference could be detected among the five species_genotype in the moderate resistant cultivar.
#### BRS 194
We now apply the same model for the susceptible cultivar.
```{r}
mix_194 <- lmer(audpc ~ species_genotype +
(1 | isolate/spike),
data = audpc_194)
mix_194
```
Evaluate the model.
```{r}
plot(mix_194, type = c("p", "smooth"), col = "black")
qqmath(mix_194, id = 0.05, col = "black")
```
Note: I used both the original and the log-transformed `audpc` and the results were the same. Hence, I kept the original data.
```{r}
Anova(mix_194)
means <- emmeans(mix_194, ~ species_genotype)
cld(means)
```
Again, the effect of species_genotype was not significant due to large variation among isolates.
## Fungicide sensitivity
The response variable is the EC50 estimated in two experiments (true replicates) for each selected isolate from each species_genotype.
### Import
```{r message=FALSE, warning=FALSE}
ec50 <- read_csv(here("data", "ec50.csv"))
ec50
```
### Summarize
Let's obtain the mean and standard deviation of ec50 for each strain.
```{r message=FALSE, warning=FALSE}
ec502 <- ec50 %>%
unite(species_genotype, species, genotype, sep = "_", remove = F) %>%
group_by(isolate, species, genotype, species_genotype) %>%
summarize(
mean_ec50 = mean(ec50),
sd_ec50 = sd(ec50)
)
ec502
```
### Visualize
Let's have a look at the mean EC50 for each isolate.
```{r message=FALSE, warning=FALSE}
ec502 %>%
ggplot(aes(reorder(isolate, mean_ec50), mean_ec50, shape = species, color = genotype)) +
geom_point() +
geom_errorbar(aes(min = mean_ec50 - sd_ec50, max = mean_ec50 + sd_ec50), width = 0.2)+
theme_few() +
coord_flip()+
theme(legend.position = "right", legend.text = element_text(size = 7, face = "italic")) +
ylim(0, 1.8) +
scale_color_viridis(discrete = TRUE) +
labs(y = (expression(paste("EC"[50], " (", mu, "g/ml)"))), x = "FGSC strain", shape = "Species", color = "Genotype" ) +
ggsave(here("figs", "ec50-new.png"), width = 4.5, height = 3)
```