-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDistributions.Rmd
249 lines (169 loc) · 6.11 KB
/
Distributions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
---
title: "problem_set_3"
author: "Lydia Holley"
date: "2024-01-24"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Uniform distribution
Create a datatset drawn from a uniform distribution with 10,000 samples taken between the values of 4 and 6
```{r uniform}
# create uniform distribution
unif_dist<-runif(10000,4,6)
```
## Question 1
Visualize that distribution with a histogram
```{r histogram}
# produce histogram
hist(unif_dist)
```
# Binomial Distribution
If two mice who are carriers for albinism (but not albino themselves) mate, their offspring have a 1/4 probability of being albino. Based on this information conduct the following analyses
```{r binomial}
# set probability of a 'success'
prob_alb = 0.25
```
## Question 2
What is the probability that exactly 2 offspring in a litter of 10 are albino?\
- 0.2815676
```{r question2}
# find probability for Q 2
dbinom(2,10,prob_alb)
```
## Question 3
Plot the probability distribution function for albinism in a litter of 10 mice.
```{r prob dist}
# plot probability distribution function
x<-1:10
plot(x,dbinom(x,size=10,prob=.25),type="h")
```
## Question 4
What is the probability that there are 8 or more albino mice in a litter of 10 mice?\
- 0.000415802
```{r question4}
# probability of 8 or more
# find probability of less than 8 and subtract from 1
q4<-1-pbinom(7,10,prob_alb)
print(q4)
```
## Question 5
What is the probability of at least 3 but no more than 5 albino mice in a litter of 10\
- 0.4546795
```{r question5}
# find probability for getting 3, 4, or 5 albino mice
# can either find each individual and add or subtract less than 3 from 5 or less
q5<-pbinom(5,10,prob_alb) - pbinom(2,10,prob_alb)
print(q5)
```
# Multinomial Distribution
## Question 6
If the parental birds shown above have 20 offspring throughout their lifetime, what is the probability of getting equal numbers of Yellow/Normal, Yellow/Clear, White/Normal, and White/Clear offspring?\
- 3.383959e-05
```{r question6}
# equeal numbers would be 5 of each
dmultinom(x=c(5,5,5,5),prob=c(0.5625,0.1875,0.1875,0.0625))
```
## Question 7
The White/Clear birds are the most valuable for your research. How many offspring should you plan for so that there is a \>80% chance of having at least 8 White/Clear baby birds?\
- You should plan for 181 or more offspring
```{r question7}
# more than 80% chance of having 8 or more white/clear
# loop until the probability of getting 8 is more than 80%
# print the outcomes. don't need multinomial since we only care about one of the outcomes
chances <- 0
x = 0
while (chances <= 0.8) {
x = x + 1
chances <- pbinom(8,x,0.0625,lower.tail = FALSE)
}
print(chances)
print(x)
```
# Standard Normal Distribution
```{r standard normal dist}
# set the rules of the test for our investigation
n_questions<-100
error_rate<-0.12
correct_answer<-0.88
```
## Question 8
Take a sample of 31 student scores from the binomial distribution of a 100 question where the rate of success is random and 88%. What is the mean score in that classroom?\
- 89.06452
```{r question8}
# take sample of 31 from 100 questions with p of 0.88 (should be around 88)
classroom<-rbinom(31,100,0.88)
mean_score<-mean(classroom)
print(mean_score)
```
## Question 9
There are 120 classes in the school district and they are all 31 students. Simulate 120 classrooms taking the exam and then plot the mean classroom score in a histogram.\
```{r question9}
# use loop to simulate 120 classes of 31 students
classes<-vector()
numClasses<-120
for (i in 1:numClasses){
class_score<-rbinom(31,100,0.88)
class_mean<-mean(class_score)
classes[i]<-class_mean
}
hist(classes,breaks=25)
```
## Question 10
You note that this distribution looks like a normal distribution! From your sample of 120 mean classroom scores across the district, calculate the mean and standard deviation of the average classroom scores (from the 31 students.)\
- mean of 88.03441 and a standard deviation of 0.6294889
```{r question 10}
# find mean and standard deviation
q10_mean<-mean(classes)
q10_sd<-sd(classes)
print(q10_mean)
print(q10_sd)
```
## Question 11
Use the mean and standard deviation of the mean classroom scores across the district to plot a normal distribution of scores under the assumption that all scores are due only to random error.\
```{r question11}
# plot normal distribution using the mean and standard deviation
x<-seq(75,100,length=100)
plot(x,dnorm(x, mean=q10_mean,sd=q10_sd), type="h")
```
## Question 12
Based on your model, you examine the two lowest scoring classrooms. Classroom A had a mean exam score of 87.5. Classroom B had a mean exam score of 80.4. Which classroom(s) could have obtained this mean score due simply to the error of the exam?\
- Classroom A based on my model.
# Chi-Squared Distribution
## Question 13
Chi-Squared Value at which 95% of values are below this threshold for
- 2 Degrees of Freedom
- 5.991465
- 6 Degrees of Freedom
- 12.59159
- 10 Degrees of Freedom
- 18.30704
- 100 Degrees of Freedom
- 124.3421
```{r question13}
# find chi square values for which 95% of values fall beneath
qchisq(0.95,2)
qchisq(0.95,6)
qchisq(0.95,10)
qchisq(0.95,100)
```
# T-Distribution
## Question 14
Plot the T-distribution for 50 classes (degrees of freedom) where the distribution is centered over the score 88.\
```{r question14}
# plot t distribution centered over 88 with 50 df
curve(dt(x-88,49), from=75, to=100 )
```
## Question 15
Based on your model, you examine the two lowest scoring classrooms in this school. Classroom A had a mean exam score of 87.5. Classroom B had a mean exam score of 80.4. Which classroom(s) could have obtained this mean score due simply to the error of the exam?\
- Classroom A could have obtained this mean score simply due to error of the exam
# F-Distribution
## Question 16
Which degrees of freedom do you use in the numerator to ensure the test is the most conservative?\
- I would use df 30 in the numerator to ensure test is conservative.
```{r question16}
# compare the f values for which 95% of values fall below depending on df
pf(0.95, (12-1), (31-1))
pf(0.95, (31-1), (12-1))
```