-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathBeta_Distribution.Rmd
173 lines (116 loc) · 5.36 KB
/
Beta_Distribution.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
---
title: "problem_set_6"
author: "Lydia Holley"
date: "2024-02-13"
output:
html_document: default
pdf_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
```
# Question Set A
```{r quesionsA}
# 1
q1<-dbinom(2,5,0.1)
# 2
q2<-dbinom(3,6,0.1)
# 3
q3<-dbinom(3,6,0.4)
```
## 1. What is the binomial probability of getting 2 out of 5 positive samples with a true rate of 0.10?
- 0.0729
## 2. Using the previously known probability (0.10) what are the chances that your next sample will contain B. harpocrates? This would bring your total sample to 3 out of 6.
- 0.014
## 3. Using your sample proportion (2/5) what are the chances that your next sample will contain B. harpocrates? This would bring your total sample to 3 out of 6.
- 0.27648
# Question Set B
```{r binomial dist}
# question 1
# mean<-alpha/(alpha+beta)
# 0.1=alpha/(alpha+20)
# 0.1(alpha+20)=alpha
# 0.1alpha+2=alpha
# 2=0.9alpha
# alpha=2.2222
# question 2
prior<-function(x){(x^1.22222)*(1-x)^19}
sum_f <- integrate(prior,0,1)
curve(((x^1.22222)*((1-x)^19))/unlist(sum_f[1]), from = 0, to=1)
# question 3
binom_likelihood<-function(x){(x^2)*(1-x)^3}
sum_b <- integrate(binom_likelihood,0,1)
curve(((x^2)*((1-x)^3))/unlist(sum_b[1]), from = 0, to=1)
# question 4
like_prior<-function(x){(x^3.22222)*(1-x)^22}
sum_l <- integrate(like_prior,0,1)
curve(((x^3.22222)*(1-x)^22)/unlist(sum_l[1]), from = 0, to=1)
# question 5
sum_ex<-unlist(integrate(like_prior,0,1)[1])
print(sum_ex)
# question 6
# posterior<-like_prior(x)/sum_ex
curve(like_prior(x)/sum_ex,0,1)
# question 7
opt<-optimize(like_prior, lower = 0, upper = 1, maximum = TRUE)
print(opt$maximum)
q7<-dbinom(3,5,opt$maximum)
q7
```
## 1. Create a function containing a binomial distribution (don't use the built-in distribution) such that the mean of the distribution is 0.10 and B=20. The mean of the binomial distribution is a/(a+B). What is the α value of the binomial prior distribution?
- alpha is 2.22222
## 2. Plot the prior distribution. For this case remember that our beta distribution for the prior is x^(a-1)(1-x)^(B-1). This is our PRIOR. Report your prior formula.
- prior = (x\^1.22222)\*(1-x)\^19
## 3. Create a function containing a binomial likelihood distribution (don't use the built-in distribution) that reflects our sampling for Field Site A. Plot this distribution. This is our LIKELIHOOD. Report your likelihood formula.
- likelihood = (x\^2)\*(1-x)\^3
## 4. Create another function that represents our binomial likelihood multiplied by our prior. Plot this distribution
- prior x likelihood = (x\^3.22222)\*(1-x)\^22
## 5. Integrate over our likelihood x prior to get a sum of the area under the curve. Report the area under the curve from 0 to 1
- area = 1.073555e-05
## 6. Create a final function that represents our posterior distribution by dividing the prior x likelihood function by the area under the curve. Plot this function!
- posterior = like_prior(x)/sum_ex
## 7. Use the optimize() function in R to find the theta with the maximum posterior value. This will provide you with a single point-estimate for the proportion of samples with the bacteria. Using this maximum posterior value (probability), what is the probability that in the five samples, you will get 3 containing a sample of B. harpocrates
- maximum posterior value = 0.1277726
- probability that in five samples we will get 3 with the bacteria = 0.01586986
# Question Set C
```{r beta dist}
# question 1
beta<-function(x){dbeta(x,7.22222,25)}
beta_plot<-ggplot()+geom_function(fun=beta,color="chartreuse",linewidth=1)
beta_plot
# question 2
opt<-optimize(beta, lower = 0, upper = 1, maximum = TRUE)
print(opt$maximum)
q7<-dbinom(3,5,opt$maximum)
q7
```
## Plot the posterior distribution but this time using the built in R beta function. To do this we need to remember that for the posterior distribution we use Beta(a + x, B + N - x). This is for observing x successes out of N trials. Use the same a and B values as you did in the previous section
- beta = function(x){dbeta(x,7.22222,25)}
## Use the optimize() function in R to find the theta with the maximum posterior value based on this new sample set. Using this maximum posterior value (probability), what is the probability that in five samples you will get 3 containing a sample of B. harpocrates
- maximum posterior probability = 0.2058793
- probability that in five samples we will get 3 with the bacteria = 0.05503145
# Question Set D
```{r new dist}
# question 1
# mean<-alpha/(alpha+beta)
# 0.1=alpha/(alpha+30)
# 0.1(alpha+30)=alpha
# 0.1alpha+3=alpha
# 3=0.9alpha
# alpha=3.3333
# question 2
beta2<-function(x){dbeta(x,8.3333333,35)}
beta_plot2<-ggplot()+geom_function(fun=beta2,color="chartreuse",linewidth=1)
beta_plot2
# question 3
opt2<-optimize(beta2, lower = 0, upper = 1, maximum = TRUE)
print(opt2$maximum)
```
## You want to maintain a prior distribution with a mean of 0.1, but you want that distribution to be more widely distributed. Therefore, adjust the beta value and then recalculate alpha. Do you use a beta of 10 or 30?
- I use beta 30
- new alpha is 3.333333
## Plot the posterior distribution for your new dataset (5/10 samples) using the Beta function in R.
- beta2 = function(x){dbeta(x,8.333333,35)}
## Use the optimize() function in R to find the θ with the maximum posterior value for theta. Report the theta.
- theta = 0.1774239