Beta_Distribution.Rmd

---
title: "problem_set_6"
author: "Lydia Holley"
date: "2024-02-13"
output:
  html_document: default
  pdf_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
```

# Question Set A

```{r quesionsA}
# 1
q1<-dbinom(2,5,0.1)

# 2
q2<-dbinom(3,6,0.1)

# 3
q3<-dbinom(3,6,0.4)

```

## 1. What is the binomial probability of getting 2 out of 5 positive samples with a true rate of 0.10?

-   0.0729

## 2. Using the previously known probability (0.10) what are the chances that your next sample will contain B. harpocrates? This would bring your total sample to 3 out of 6.

-   0.014

## 3. Using your sample proportion (2/5) what are the chances that your next sample will contain B. harpocrates? This would bring your total sample to 3 out of 6.

-   0.27648

# Question Set B

```{r binomial dist}

# question 1
# mean<-alpha/(alpha+beta)
# 0.1=alpha/(alpha+20)
# 0.1(alpha+20)=alpha
# 0.1alpha+2=alpha
# 2=0.9alpha
# alpha=2.2222

# question 2
prior<-function(x){(x^1.22222)*(1-x)^19}
sum_f <- integrate(prior,0,1)
curve(((x^1.22222)*((1-x)^19))/unlist(sum_f[1]), from = 0, to=1)

# question 3
binom_likelihood<-function(x){(x^2)*(1-x)^3}
sum_b <- integrate(binom_likelihood,0,1)
curve(((x^2)*((1-x)^3))/unlist(sum_b[1]), from = 0, to=1)

# question 4
like_prior<-function(x){(x^3.22222)*(1-x)^22}
sum_l <- integrate(like_prior,0,1)
curve(((x^3.22222)*(1-x)^22)/unlist(sum_l[1]), from = 0, to=1)

# question 5
sum_ex<-unlist(integrate(like_prior,0,1)[1])
print(sum_ex)

# question 6
# posterior<-like_prior(x)/sum_ex
curve(like_prior(x)/sum_ex,0,1)

# question 7
opt<-optimize(like_prior, lower = 0, upper = 1, maximum = TRUE)
print(opt$maximum)

q7<-dbinom(3,5,opt$maximum)
q7
```

## 1. Create a function containing a binomial distribution (don't use the built-in distribution) such that the mean of the distribution is 0.10 and B=20. The mean of the binomial distribution is a/(a+B). What is the α value of the binomial prior distribution?

-   alpha is 2.22222

## 2. Plot the prior distribution. For this case remember that our beta distribution for the prior is x^(a-1)(1-x)^(B-1). This is our PRIOR. Report your prior formula.

-   prior = (x\^1.22222)\*(1-x)\^19

## 3. Create a function containing a binomial likelihood distribution (don't use the built-in distribution) that reflects our sampling for Field Site A. Plot this distribution. This is our LIKELIHOOD. Report your likelihood formula.

-   likelihood = (x\^2)\*(1-x)\^3

## 4. Create another function that represents our binomial likelihood multiplied by our prior. Plot this distribution

-   prior x likelihood = (x\^3.22222)\*(1-x)\^22

## 5. Integrate over our likelihood x prior to get a sum of the area under the curve. Report the area under the curve from 0 to 1

-   area = 1.073555e-05

## 6. Create a final function that represents our posterior distribution by dividing the prior x likelihood function by the area under the curve. Plot this function!

-   posterior = like_prior(x)/sum_ex

## 7. Use the optimize() function in R to find the theta with the maximum posterior value. This will provide you with a single point-estimate for the proportion of samples with the bacteria. Using this maximum posterior value (probability), what is the probability that in the five samples, you will get 3 containing a sample of B. harpocrates

-   maximum posterior value = 0.1277726
-   probability that in five samples we will get 3 with the bacteria = 0.01586986

# Question Set C

```{r beta dist}

# question 1
beta<-function(x){dbeta(x,7.22222,25)}
beta_plot<-ggplot()+geom_function(fun=beta,color="chartreuse",linewidth=1)
beta_plot

# question 2
opt<-optimize(beta, lower = 0, upper = 1, maximum = TRUE)
print(opt$maximum)

q7<-dbinom(3,5,opt$maximum)
q7
```

## Plot the posterior distribution but this time using the built in R beta function. To do this we need to remember that for the posterior distribution we use Beta(a + x, B + N - x). This is for observing x successes out of N trials. Use the same a and B values as you did in the previous section

-   beta = function(x){dbeta(x,7.22222,25)}

## Use the optimize() function in R to find the theta with the maximum posterior value based on this new sample set. Using this maximum posterior value (probability), what is the probability that in five samples you will get 3 containing a sample of B. harpocrates

-   maximum posterior probability = 0.2058793
-   probability that in five samples we will get 3 with the bacteria = 0.05503145

# Question Set D

```{r new dist}

# question 1
# mean<-alpha/(alpha+beta)
# 0.1=alpha/(alpha+30)
# 0.1(alpha+30)=alpha
# 0.1alpha+3=alpha
# 3=0.9alpha
# alpha=3.3333

# question 2
beta2<-function(x){dbeta(x,8.3333333,35)}
beta_plot2<-ggplot()+geom_function(fun=beta2,color="chartreuse",linewidth=1)
beta_plot2

# question 3
opt2<-optimize(beta2, lower = 0, upper = 1, maximum = TRUE)
print(opt2$maximum)

```

## You want to maintain a prior distribution with a mean of 0.1, but you want that distribution to be more widely distributed. Therefore, adjust the beta value and then recalculate alpha. Do you use a beta of 10 or 30?

-   I use beta 30
-   new alpha is 3.333333

## Plot the posterior distribution for your new dataset (5/10 samples) using the Beta function in R.

-   beta2 = function(x){dbeta(x,8.333333,35)}

## Use the optimize() function in R to find the θ with the maximum posterior value for theta. Report the theta.

-   theta = 0.1774239