-
-
Notifications
You must be signed in to change notification settings - Fork 39
/
Copy path04-basic-inference.Rmd
5152 lines (3404 loc) · 223 KB
/
04-basic-inference.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Basic Statistical Inference
Statistical inference involves drawing conclusions about population parameters based on sample data. The two primary goals of inference are:
1. **Making inferences** about the true parameter value ($\beta$) based on our estimator or estimate:
- This involves interpreting the sample-derived estimate to understand the population parameter.
- Examples include estimating population means, variances, or proportions.
2. **Testing whether underlying assumptions hold true**, including:
- Assumptions about the true population parameters (e.g., $\mu$, $\sigma^2$).
- Assumptions about random variables (e.g., independence, normality).
- Assumptions about the model specification (e.g., linearity in regression).
**Note**: Statistical testing does not:
- Confirm with absolute certainty that a hypothesis is true or false.
- Interpret the magnitude of the estimated value in economic, practical, or business contexts without additional analysis.
- **Statistical significance**: Refers to whether an observed effect is unlikely due to chance.
- **Practical significance**: Focuses on the real-world importance of the effect.
**Example**:
- A marketing campaign increases sales by $0.5\%$, which is statistically significant ($p < 0.05$). However, in a small market, this may lack practical significance.
Instead, inference provides a framework for making probabilistic statements about population parameters, given sample data.
------------------------------------------------------------------------
## Hypothesis Testing Framework
Hypothesis testing is one of the fundamental tools in statistics. It provides a formal procedure to test claims or assumptions (hypotheses) about population parameters using sample data. This process is essential in various fields, including business, medicine, and social sciences, as it helps answer questions like "Does a new marketing strategy improve sales?" or "Is there a significant difference in test scores between two teaching methods?"
The goal of hypothesis testing is to make decisions or draw conclusions about a population based on sample data. This is necessary because we rarely have access to the entire population. For example, if a company wants to determine whether a new advertising campaign increases sales, it might analyze data from a sample of stores rather than every store globally.
**Key Steps in Hypothesis Testing**
1. **Formulate Hypotheses**: Define the null and alternative hypotheses.
2. **Choose a Significance Level** ($\alpha$): Determine the acceptable probability of making a Type I error.
3. **Select a Test Statistic**: Identify the appropriate statistical test based on the data and hypotheses.
4. **Define the Rejection Region**: Specify the range of values for which the null hypothesis will be rejected.
5. **Compute the Test Statistic**: Use sample data to calculate the test statistic.
6. **Make a Decision**: Compare the test statistic to the critical value or use the p-value to decide whether to reject or fail to reject the null hypothesis.
------------------------------------------------------------------------
### Null and Alternative Hypotheses
At the heart of hypothesis testing lies the formulation of two competing hypotheses:
1. **Null Hypothesis (**$H_0$):
- Represents the current state of knowledge, status quo, or no effect.
- It is assumed true unless there is strong evidence against it.
- Examples:
- $H_0: \mu_1 = \mu_2$ (no difference in means between two groups).
- $H_0: \beta = 0$ (a predictor variable has no effect in a regression model).
- Think of $H_0$ as the "default assumption."
2. **Alternative Hypothesis (**$H_a$ or $H_1$):
- Represents a claim that contradicts the null hypothesis.
- It is what you are trying to prove or find evidence for.
- Examples:
- $H_a: \mu_1 \neq \mu_2$ (means of two groups are different).
- $H_a: \beta \neq 0$ (a predictor variable has an effect).
------------------------------------------------------------------------
### Errors in Hypothesis Testing
Hypothesis testing involves decision-making under uncertainty, meaning there is always a risk of making errors. These errors are classified into two types:
1. **Type I Error** ($\alpha$):
- Occurs when the null hypothesis is rejected, even though it is true.
- Example: Concluding that a medication is effective when it actually has no effect.
- The probability of making a Type I error is denoted by $\alpha$, called the **significance level** (commonly set at 0.05 or 5%).
2. **Type II Error** ($\beta$):
- Occurs when the null hypothesis is not rejected, but the alternative hypothesis is true.
- Example: Failing to detect that a medication is effective when it actually works.
- The complement of $\beta$ is called the **power** of the test ($1 - \beta$), representing the probability of correctly rejecting the null hypothesis.
**Analogy: The Legal System**
To make this concept more intuitive, consider the analogy of a courtroom:
- **Null Hypothesis (**$H_0$): The defendant is innocent.
- **Alternative Hypothesis (**$H_a$): The defendant is guilty.
- **Type I Error:** Convicting an innocent person (false positive).
- **Type II Error:** Letting a guilty person go free (false negative).
Balancing $\alpha$ and $\beta$ is critical in hypothesis testing, as reducing one often increases the other. For example, if you make it harder to reject $H_0$ (reducing $\alpha$), you increase the chance of failing to detect a true effect (increasing $\beta$).
------------------------------------------------------------------------
### The Role of Distributions in Hypothesis Testing
Distributions play a fundamental role in hypothesis testing because they provide a mathematical model for understanding how a test statistic behaves under the null hypothesis ($H_0$). Without distributions, it would be impossible to determine whether the observed results are due to random chance or provide evidence to reject the null hypothesis.
#### Expected Outcomes
One of the key reasons distributions are so crucial is that they describe the range of values a test statistic is likely to take when $H_0$ is true. This helps us understand what is considered "normal" variation in the data due to random chance. For example:
- Imagine you are conducting a study to test whether a new marketing strategy increases the average monthly sales. Under the null hypothesis, you assume the new strategy has no effect, and the average sales remain unchanged.
- When you collect a sample and calculate the test statistic, you compare it to the expected distribution (e.g., the normal distribution for a $z$-test). This distribution shows the range of test statistic values that are likely to occur purely due to random fluctuations in the data, assuming $H_0$ is true.
By providing this baseline of what is "normal," distributions allow us to identify unusual results that may indicate the null hypothesis is false.
#### Critical Values and Rejection Regions
Distributions also help define critical values and rejection regions in hypothesis testing. Critical values are specific points on the distribution that mark the boundaries of the rejection region. The rejection region is the range of values for the test statistic that lead us to reject $H_0$.
The location of these critical values depends on:
- The **level of significance** ($\alpha$), which is the probability of rejecting $H_0$ when it is true (a Type I error).
- The shape of the test statistic's distribution under $H_0$.
For example:
- In a one-tailed $z$-test with $\alpha = 0.05$, the critical value is approximately $1.645$ for a standard normal distribution. If the calculated test statistic exceeds this value, we reject $H_0$ because such a result would be very unlikely under $H_0$.
Distributions help us visually and mathematically determine these critical points. By examining the distribution, we can see where the rejection region lies and what the probability is of observing a value in that region by random chance alone.
#### P-values
The p-value, a central concept in hypothesis testing, is directly derived from the distribution of the test statistic under $H_0$. The p-value represents the probability of observing a test statistic as extreme as (or more extreme than) the one calculated, assuming $H_0$ is true.
The **p-value** quantifies the strength of evidence against $H_0$. It represents the probability of observing a test statistic as extreme as (or more extreme than) the one calculated, assuming $H_0$ is true.
- **Small p-value** (**\<** $\alpha$): Strong evidence against $H_0$; reject $H_0$.
- **Large p-value** (**\>** $\alpha$): Weak evidence against $H_0$; fail to reject $H_0$.
For example:
- Suppose you calculate a $z$-test statistic of $2.1$ in a one-tailed test. Using the standard normal distribution, the p-value is the area under the curve to the right of $z = 2.1$. This area represents the likelihood of observing a result as extreme as $z = 2.1$ if $H_0$ is true.
- In this case, the p-value is approximately $0.0179$. A small p-value (typically less than $\alpha = 0.05$) suggests that the observed result is unlikely under $H_0$ and provides evidence to reject the null hypothesis.
#### Why Does All This Matter?
To summarize, distributions are the backbone of hypothesis testing because they allow us to:
- Define what is expected under $H_0$ by modeling the behavior of the test statistic.
- Identify results that are unlikely to occur by random chance, which leads to the rejection of $H_0$.
- Calculate p-values to quantify the strength of evidence against $H_0$.
Distributions provide the framework for understanding the role of chance in statistical analysis. They are essential for determining expected outcomes, setting thresholds for decision-making (critical values and rejection regions), and calculating p-values. A solid grasp of distributions will greatly enhance your ability to interpret and conduct hypothesis tests, making it easier to draw meaningful conclusions from data.
------------------------------------------------------------------------
### The Test Statistic
The test statistic is a crucial component in hypothesis testing, as it quantifies how far the observed data deviates from what we would expect if the null hypothesis ($H_0$) were true. Essentially, it provides a standardized way to compare the observed outcomes against the expectations set by $H_0$, enabling us to assess whether the observed results are likely due to random chance or indicative of a significant effect.
The general formula for a test statistic is:
$$
\text{Test Statistic} = \frac{\text{Observed Value} - \text{Expected Value under } H_0}{\text{Standard Error}}
$$
Each component of this formula has an important role:
1. **Numerator:**
- The numerator represents the difference between the actual data (observed value) and the hypothetical value (expected value) that is assumed under $H_0$.
- This difference quantifies the extent of the deviation. A larger deviation suggests stronger evidence against $H_0$.
2. **Denominator:**
- The denominator is the **standard error**, which measures the variability or spread of the data. It accounts for factors such as sample size and the inherent randomness of the data.
- By dividing the numerator by the standard error, the test statistic is standardized, allowing comparisons across different studies, sample sizes, and distributions.
The test statistic plays a central role in determining whether to reject $H_0$. Once calculated, it is compared to a known distribution (e.g., standard normal distribution for $z$-tests or $t$-distribution for $t$-tests). This comparison allows us to evaluate the likelihood of observing such a test statistic under $H_0$:
- **If the test statistic is close to 0:** This indicates that the observed data is very close to what is expected under $H_0$. There is little evidence to suggest rejecting $H_0$.
- **If the test statistic is far from 0 (in the tails of the distribution):** This suggests that the observed data deviates significantly from the expectations under $H_0$. Such deviations may provide strong evidence against $H_0$.
#### Why Standardizing Matters
Standardizing the difference between the observed and expected values ensures that the test statistic is not biased by factors such as the scale of measurement or the size of the sample. For instance:
- A raw difference of 5 might be highly significant in one context but negligible in another, depending on the variability (standard error).
- Standardizing ensures that the magnitude of the test statistic reflects both the size of the difference and the reliability of the sample data.
#### Interpreting the Test Statistic
After calculating the test statistic, it is used to:
1. Compare with a critical value: For example, in a $z$-test with $\alpha = 0.05$, the critical values are $-1.96$ and $1.96$ for a two-tailed test. If the test statistic falls beyond these values, $H_0$ is rejected.
2. Calculate the p-value: The p-value is derived from the distribution and reflects the probability of observing a test statistic as extreme as the one calculated if $H_0$ is true.
------------------------------------------------------------------------
### Critical Values and Rejection Regions
The **critical value** is a point on the distribution that separates the rejection region from the non-rejection region:
- **Rejection Region**: If the test statistic falls in this region, we reject $H_0$.
- **Non-Rejection Region**: If the test statistic falls here, we fail to reject $H_0$.
The rejection region depends on the significance level ($\alpha$). For a two-tailed test with $\alpha = 0.05$, the critical values correspond to the top 2.5% and bottom 2.5% of the distribution.
------------------------------------------------------------------------
### Visualizing Hypothesis Testing
Let's create a visualization to tie these concepts together:
```{r}
# Parameters
alpha <- 0.05 # Significance level
df <- 29 # Degrees of freedom (for t-distribution)
t_critical <-
qt(1 - alpha / 2, df) # Critical value for two-tailed test
# Generate t-distribution values
t_values <- seq(-4, 4, length.out = 1000)
density <- dt(t_values, df)
# Observed test statistic
t_obs <- 2.5 # Example observed test statistic
# Plot the t-distribution
plot(
t_values,
density,
type = "l",
lwd = 2,
col = "blue",
main = "Hypothesis Testing with Distribution",
xlab = "Test Statistic (t-value)",
ylab = "Density",
ylim = c(0, 0.4)
)
# Shade the rejection regions
polygon(c(t_values[t_values <= -t_critical], -t_critical),
c(density[t_values <= -t_critical], 0),
col = "red",
border = NA)
polygon(c(t_values[t_values >= t_critical], t_critical),
c(density[t_values >= t_critical], 0),
col = "red",
border = NA)
# Add observed test statistic
points(
t_obs,
dt(t_obs, df),
col = "green",
pch = 19,
cex = 1.5
)
text(
t_obs,
dt(t_obs, df) + 0.02,
paste("Observed t:", round(t_obs, 2)),
col = "green",
pos = 3
)
# Highlight the critical values
abline(
v = c(-t_critical, t_critical),
col = "black",
lty = 2
)
text(
-t_critical,
0.05,
paste("Critical Value:", round(-t_critical, 2)),
pos = 4,
col = "black"
)
text(
t_critical,
0.05,
paste("Critical Value:", round(t_critical, 2)),
pos = 4,
col = "black"
)
# Calculate p-value
p_value <- 2 * (1 - pt(abs(t_obs), df)) # Two-tailed p-value
text(0,
0.35,
paste("P-value:", round(p_value, 4)),
col = "blue",
pos = 3)
# Annotate regions
text(-3,
0.15,
"Rejection Region",
col = "red",
pos = 3)
text(3, 0.15, "Rejection Region", col = "red", pos = 3)
text(0,
0.05,
"Non-Rejection Region",
col = "blue",
pos = 3)
# Add legend
legend(
"topright",
legend = c("Rejection Region", "Critical Value", "Observed Test Statistic"),
col = c("red", "black", "green"),
lty = c(NA, 2, NA),
pch = c(15, NA, 19),
bty = "n"
)
```
------------------------------------------------------------------------
## Key Concepts and Definitions
### Random Sample
A random sample of size $n$ consists of $n$ independent observations, each drawn from the same underlying population distribution. Independence ensures that no observation influences another, and identical distribution guarantees that all observations are governed by the same probability rules.
### Sample Statistics
#### Sample Mean
The sample mean is a measure of central tendency:
$$
\bar{X} = \frac{\sum_{i=1}^{n} X_i}{n}
$$
- Example: Suppose we measure the heights of 5 individuals (in cm): $170, 165, 180, 175, 172$. The sample mean is:
$$
\bar{X} = \frac{170 + 165 + 180 + 175 + 172}{5} = 172.4 \, \text{cm}.
$$
#### Sample Median
The sample median is the middle value of ordered data:
$$
\tilde{x} = \begin{cases}
\text{Middle observation,} & \text{if } n \text{ is odd}, \\
\text{Average of two middle observations,} & \text{if } n \text{ is even}.
\end{cases}
$$
#### Sample Variance
The sample variance measures data spread:
$$
S^2 = \frac{\sum_{i=1}^{n}(X_i - \bar{X})^2}{n-1}
$$
#### Sample Standard Deviation
The sample standard deviation is the square root of the variance:
$$
S = \sqrt{S^2}
$$
#### Sample Proportions
Used for categorical data:
$$
\hat{p} = \frac{X}{n} = \frac{\text{Number of successes}}{\text{Sample size}}
$$
#### Estimators
- **Point Estimator**: A statistic ($\hat{\theta}$) used to estimate a population parameter ($\theta$).
- **Point Estimate**:The numerical value assumed by $\hat{\theta}$ when evaluated for a given sample.
- **Unbiased Estimator**: A point estimator $\hat{\theta}$ is unbiased if $E(\hat{\theta}) = \theta$.
Examples of unbiased estimators:
- $\bar{X}$ for $\mu$ (population mean).
- $S^2$ for $\sigma^2$ (population variance).
- $\hat{p}$ for $p$ (population proportion).
- $\widehat{p_1-p_2}$ for $p_1- p_2$ (population proportion difference)
- $\bar{X_1} - \bar{X_2}$ for $\mu_1 - \mu_2$ (population mean difference)
**Note**: While $S^2$ is unbiased for $\sigma^2$, $S$ is a biased estimator of $\sigma$.
------------------------------------------------------------------------
### Distribution of the Sample Mean
The sampling distribution of the mean $\bar{X}$ depends on:
1. **Population Distribution**:
- If $X \sim N(\mu, \sigma^2)$, then $\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)$.
2. **Central Limit Theorem**:
- For large $n$, $\bar{X}$ approximately follows a normal distribution, regardless of the population's shape.
#### Standard Error of the Mean
The standard error quantifies variability in $\bar{X}$:
$$
\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}
$$
**Example**: - Suppose $\sigma = 10$ and $n = 25$. Then: $$
\sigma_{\bar{X}} = \frac{10}{\sqrt{25}} = 2.
$$
The smaller the standard error, the more precise our estimate of the population mean.
------------------------------------------------------------------------
## One-Sample Inference
### For Single Mean
Consider a scenario where
$$
Y_i \sim \text{i.i.d. } N(\mu, \sigma^2),
$$
where i.i.d. stands for "independent and identically distributed." This model can be expressed as:
$$
Y_i = \mu + \epsilon_i,
$$
where:
- $\epsilon_i \sim^{\text{i.i.d.}} N(0, \sigma^2)$,
- $E(Y_i) = \mu$,
- $\text{Var}(Y_i) = \sigma^2$,
- $\bar{y} \sim N(\mu, \sigma^2 / n)$.
When $\sigma^2$ is estimated by $s^2$, the standardized test statistic follows a $t$-distribution:
$$
\frac{\bar{y} - \mu}{s / \sqrt{n}} \sim t_{n-1}.
$$
A $100(1-\alpha)\%$ confidence interval for $\mu$ is obtained as:
$$
1 - \alpha = P\left(-t_{\alpha/2;n-1} \leq \frac{\bar{y} - \mu}{s / \sqrt{n}} \leq t_{\alpha/2;n-1}\right),
$$
or equivalently,
$$
P\left(\bar{y} - t_{\alpha/2;n-1}\frac{s}{\sqrt{n}} \leq \mu \leq \bar{y} + t_{\alpha/2;n-1}\frac{s}{\sqrt{n}}\right).
$$
The confidence interval is expressed as:
$$
\bar{y} \pm t_{\alpha/2;n-1}\frac{s}{\sqrt{n}},
$$
where $s / \sqrt{n}$ is the standard error of $\bar{y}$.
If the experiment were repeated many times, $100(1-\alpha)\%$ of these intervals would contain $\mu$.
+-------------------------------------------------+---------------------------------------------------+-------------------------------------------------+-------------------------------------------------+
| Case | Confidence Interval $100(1-\alpha)\%$ | Sample Size (Confidence $\alpha$, Error $d$) | Hypothesis Test Statistic |
+=================================================+===================================================+=================================================+=================================================+
| $\sigma^2$ known, $X$ normal (or $n \geq 25$) | $\bar{X} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}$ | $n \approx \frac{z_{\alpha/2}^2 \sigma^2}{d^2}$ | $z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}}$ |
+-------------------------------------------------+---------------------------------------------------+-------------------------------------------------+-------------------------------------------------+
| $\sigma^2$ unknown, $X$ normal (or $n \geq 25$) | $\bar{X} \pm t_{\alpha/2}\frac{s}{\sqrt{n}}$ | $n \approx \frac{z_{\alpha/2}^2 s^2}{d^2}$ | $t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}}$ |
+-------------------------------------------------+---------------------------------------------------+-------------------------------------------------+-------------------------------------------------+
#### Power in Hypothesis Testing
Power ($\pi(\mu)$) of a hypothesis test represents the probability of correctly rejecting the null hypothesis ($H_0$) when it is false (i.e., when alternative hypothesis $H_A$ is true). Formally, it is expressed as:
$$ \begin{aligned} \text{Power} &= \pi(\mu) = 1 - \beta \\ &= P(\text{test rejects } H_0|\mu) \\ &= P(\text{test rejects } H_0| H_A \text{ is true}), \end{aligned} $$
where $\beta$ is the probability of a Type II error (failing to reject $H_0$ when it is false).
To calculate this probability:
1. **Under** $H_0$: The distribution of the test statistic is centered around the null parameter (e.g., $\mu_0$).
2. **Under** $H_A$: The test statistic is distributed differently, shifted according to the true value under $H_A$ (e.g., $\mu_1$).
Hence, to evaluate the power, it is crucial to determine the distribution of the test statistic under the alternative hypothesis, $H_A$.
Below, we derive the power for both one-sided and two-sided z-tests.
------------------------------------------------------------------------
##### One-Sided z-Test
Consider the hypotheses:
$$ H_0: \mu \leq \mu_0 \quad \text{vs.} \quad H_A: \mu > \mu_0 $$
The power for a one-sided z-test is derived as follows:
1. The test rejects $H_0$ if $\bar{y} > \mu_0 + z_{\alpha} \frac{\sigma}{\sqrt{n}}$, where $z_{\alpha}$ is the critical value for the test at the significance level $\alpha$.
2. Under the alternative hypothesis, the distribution of $\bar{y}$ is centered at $\mu$, with standard deviation $\frac{\sigma}{\sqrt{n}}$.
3. The power is then:
$$
\begin{aligned}
\pi(\mu) &= P\left(\bar{y} > \mu_0 + z_{\alpha} \frac{\sigma}{\sqrt{n}} \middle| \mu \right) \\
&= P\left(Z > z_{\alpha} + \frac{\mu_0 - \mu}{\sigma / \sqrt{n}} \middle| \mu \right), \quad \text{where } Z = \frac{\bar{y} - \mu}{\sigma / \sqrt{n}} \\
&= 1 - \Phi\left(z_{\alpha} + \frac{(\mu_0 - \mu)\sqrt{n}}{\sigma}\right) \\
&= \Phi\left(-z_{\alpha} + \frac{(\mu - \mu_0)\sqrt{n}}{\sigma}\right).
\end{aligned}
$$
Here, we use the symmetry of the standard normal distribution: $1 - \Phi(x) = \Phi(-x)$.
Suppose we wish to show that the mean response $\mu$ under the treatment is higher than the mean response $\mu_0$ without treatment (i.e., the treatment effect $\delta = \mu - \mu_0$ is large).
Since power is an increasing function of $\mu - \mu_0$, it suffices to find the sample size $n$ that achieves the desired power $1 - \beta$ at $\mu = \mu_0 + \delta$. The power at $\mu = \mu_0 + \delta$ is:
$$
\pi(\mu_0 + \delta) = \Phi\left(-z_{\alpha} + \frac{\delta \sqrt{n}}{\sigma}\right) = 1 - \beta
$$
Given $\Phi(z_{\beta}) = 1 - \beta$, we have:
$$
-z_{\alpha} + \frac{\delta \sqrt{n}}{\sigma} = z_{\beta}
$$
Solving for $n$, we obtain:
$$
n = \left(\frac{(z_{\alpha} + z_{\beta})\sigma}{\delta}\right)^2
$$
Larger sample sizes are required when:
- The sample variability is large ($\sigma$ is large).
- The significance level $\alpha$ is small ($z_{\alpha}$ is large).
- The desired power $1 - \beta$ is large ($z_{\beta}$ is large).
- The magnitude of the effect is small ($\delta$ is small).
In practice, $\delta$ and $\sigma$ are often unknown. To estimate $\sigma$, you can:
1. Use prior studies or pilot studies.
2. Approximate $\sigma$ based on the anticipated range of the observations (excluding outliers). For normally distributed data, dividing the range by 4 provides a reasonable estimate of $\sigma$.
These considerations ensure the test is adequately powered to detect meaningful effects while balancing practical constraints such as sample size.
##### Two-Sided z-Test
For a two-sided test, the hypotheses are:
$$
H_0: \mu = \mu_0 \quad \text{vs.} \quad H_A: \mu \neq \mu_0
$$
The test rejects $H_0$ if $\bar{y}$ lies outside the interval $\mu_0 \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$. The power of the test is:
$$
\begin{aligned}
\pi(\mu) &= P\left(\bar{y} < \mu_0 - z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \middle| \mu \right) + P\left(\bar{y} > \mu_0 + z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \middle| \mu \right) \\
&= \Phi\left(-z_{\alpha/2} + \frac{(\mu - \mu_0)\sqrt{n}}{\sigma}\right) + \Phi\left(-z_{\alpha/2} - \frac{(\mu - \mu_0)\sqrt{n}}{\sigma}\right).
\end{aligned}
$$
To ensure a power of $1-\beta$ when the treatment effect $\delta = |\mu - \mu_0|$ is at least a certain value, we solve for $n$. Since the power function for a two-sided test is increasing and symmetric in $|\mu - \mu_0|$, it suffices to find $n$ such that the power equals $1-\beta$ when $\mu = \mu_0 + \delta$. This gives:
$$
n = \left(\frac{(z_{\alpha/2} + z_{\beta}) \sigma}{\delta}\right)^2
$$
Alternatively, the required sample size can be determined using a confidence interval approach. For a two-sided $\alpha$-level confidence interval of the form:
$$
\bar{y} \pm D
$$
where $D = z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$, solving for $n$ gives:
$$
n = \left(\frac{z_{\alpha/2} \sigma}{D}\right)^2
$$
This value should be rounded up to the nearest integer to ensure the required precision.
```{r}
# Generate random data and compute a 95% confidence interval
data <- rnorm(100) # Generate 100 random values
t.test(data, conf.level = 0.95) # Perform t-test with 95% confidence interval
```
For a one-sided hypothesis test, such as testing $H_0: \mu \geq 30$ versus $H_a: \mu < 30$:
```{r}
# Perform one-sided t-test
t.test(data, mu = 30, alternative = "less")
```
When $\sigma$ is unknown, you can estimate it using:
1. Prior studies or pilot studies.
2. The range of observations (excluding outliers) divided by 4, which provides a reasonable approximation for normally distributed data.
##### z-Test Summary
- For one-sided tests:
$$ \pi(\mu) = \Phi\left(-z_{\alpha} + \frac{(\mu - \mu_0)\sqrt{n}}{\sigma}\right) $$
- For two-sided tests:
$$ \pi(\mu) = \Phi\left(-z_{\alpha/2} + \frac{(\mu - \mu_0)\sqrt{n}}{\sigma}\right) + \Phi\left(-z_{\alpha/2} - \frac{(\mu - \mu_0)\sqrt{n}}{\sigma}\right) $$
**Factors Affecting Power**
- **Effect Size (**$\mu - \mu_0$): Larger differences between $\mu$ and $\mu_0$ increase power.
- **Sample Size (**$n$): Larger $n$ reduces the standard error, increasing power.
- **Variance (**$\sigma^2$): Smaller variance increases power.
- **Significance Level (**$\alpha$): Increasing $\alpha$ (making the test more liberal) increases power through $z_{\alpha}$.
##### One-Sample t-test
In hypothesis testing, calculating the power and determining the required sample size for **t-tests** are more complex than for **z-tests**. This complexity arises from the involvement of the **Student's t-distribution** and its generalized form, the **non-central t-distribution**.
The power function for a one-sample t-test can be expressed as:
$$
\pi(\mu) = P\left(\frac{\bar{y} - \mu_0}{s / \sqrt{n}} > t_{n-1; \alpha} \mid \mu \right)
$$
Here:
- $\mu_0$ is the hypothesized population mean under the null hypothesis,
- $\bar{y}$ is the sample mean,
- $s$ is the sample standard deviation,
- $n$ is the sample size,
- $t_{n-1; \alpha}$ is the critical t-value from the Student's t-distribution with $n-1$ degrees of freedom at significance level $\alpha$.
When $\mu > \mu_0$ (i.e., $\mu - \mu_0 = \delta$), the random variable
$$
T = \frac{\bar{y} - \mu_0}{s / \sqrt{n}}
$$
does not follow the Student's t-distribution. Instead, it follows a **non-central t-distribution** with:
- a **non-centrality parameter** $\lambda = \delta \sqrt{n} / \sigma$, where $\sigma$ is the population standard deviation,
- degrees of freedom $n-1$.
**Key Properties of the Power Function**
- The power $\pi(\mu)$ is an increasing function of the non-centrality parameter $\lambda$.
- For $\delta = 0$ (i.e., when the null hypothesis is true), the non-central t-distribution simplifies to the regular Student's t-distribution.
To calculate the power in practice, numerical procedures (see below) or precomputed charts are typically required.
**Approximate Sample Size Adjustment for t-tests**
When planning a study, researchers often start with an approximation based on **z-tests** and then adjust for the specifics of the t-test. Here's the process:
1\. Start with the Sample Size for a z-test
For a two-sided test: $$
n_z = \frac{\left(z_{\alpha/2} + z_\beta\right)^2 \sigma^2}{\delta^2}
$$ where:
- $z_{\alpha/2}$ is the critical value from the standard normal distribution for a two-tailed test,
- $z_\beta$ corresponds to the desired power $1 - \beta$,
- $\delta$ is the effect size $\mu - \mu_0$,
- $\sigma$ is the population standard deviation.
2\. Adjust for the t-distribution
Let $v = n - 1$, where $n$ is the sample size derived from the z-test. For a two-sided t-test, the approximate sample size is:
$$
n^* = \frac{\left(t_{v; \alpha/2} + t_{v; \beta}\right)^2 \sigma^2}{\delta^2}
$$
Here:
- $t_{v; \alpha/2}$ and $t_{v; \beta}$ are the critical values from the Student's t-distribution for the significance level $\alpha$ and desired power, respectively.
- Since $v$ depends on $n^*$, this process may require iterative refinement.
Notes:
1. **Approximations**: The above formulas provide an intuitive starting point but may require adjustments based on exact numerical solutions.
2. **Insights**: Power is an increasing function of:
- the effect size $\delta$,
- the sample size $n$,
- and a decreasing function of the population variability $\sigma$.
```{r}
# Example: Power calculation for a one-sample t-test
library(pwr)
# Parameters
effect_size <- 0.5 # Cohen's d
alpha <- 0.05 # Significance level
power <- 0.8 # Desired power
# Compute sample size
sample_size <-
pwr.t.test(
d = effect_size,
sig.level = alpha,
power = power,
type = "one.sample"
)$n
# Print result
cat("Required sample size for one-sample t-test:",
ceiling(sample_size),
"\n")
# Power calculation for a given sample size
calculated_power <-
pwr.t.test(
n = ceiling(sample_size),
d = effect_size,
sig.level = alpha,
type = "one.sample"
)$power
cat("Achieved power with computed sample size:",
calculated_power,
"\n")
```
### For Difference of Means, Independent Samples
+-------------------------------------------------------+----------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| | $100(1-\alpha)%$ Confidence Interval | Hypothesis Testing Test Statistic | |
+=======================================================+==============================================================================================+=========================================================================================================+========================================================================================================================================================+
| When $\sigma^2$ is known | $\bar{X}_1 - \bar{X}_2 \pm z_{\alpha/2}\sqrt{\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}}$ | $z= \frac{(\bar{X}_1-\bar{X}_2)-(\mu_1-\mu_2)_0}{\sqrt{\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}}}$ | |
+-------------------------------------------------------+----------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| When $\sigma^2$ is unknown, Variances Assumed EQUAL | $\bar{X}_1 - \bar{X}_2 \pm t_{\alpha/2}\sqrt{s^2_p(\frac{1}{n_1}+\frac{1}{n_2})}$ | $t = \frac{(\bar{X}_1-\bar{X}_2)-(\mu_1-\mu_2)_0}{\sqrt{s^2_p(\frac{1}{n_1}+\frac{1}{n_2})}}$ | Pooled Variance: $s_p^2 = \frac{(n_1 -1)s^2_1 - (n_2-1)s^2_2}{n_1 + n_2 -2}$ Degrees of Freedom: $\gamma = n_1 + n_2 -2$ |
+-------------------------------------------------------+----------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| When $\sigma^2$ is unknown, Variances Assumed UNEQUAL | $\bar{X}_1 - \bar{X}_2 \pm t_{\alpha/2}\sqrt{(\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2})}$ | $t = \frac{(\bar{X}_1-\bar{X}_2)-(\mu_1-\mu_2)_0}{\sqrt{(\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2})}}$ | Degrees of Freedom: $\gamma = \frac{(\frac{s_1^2}{n_1}+\frac{s^2_2}{n_2})^2}{\frac{(\frac{s_1^2}{n_1})^2}{n_1-1}+\frac{(\frac{s_2^2}{n_2})^2}{n_2-1}}$ |
+-------------------------------------------------------+----------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
### For Difference of Means, Paired Samples
+---------------------------+------------------------------------------------+
| Metric | Formula |
+===========================+================================================+
| Confidence Interval | $\bar{D} \pm t_{\alpha/2}\frac{s_d}{\sqrt{n}}$ |
+---------------------------+------------------------------------------------+
| Hypothesis Test Statistic | $t = \frac{\bar{D} - D_0}{s_d / \sqrt{n}}$ |
+---------------------------+------------------------------------------------+
### For Difference of Two Proportions
The mean of the difference between two sample proportions is given by:
$$
\hat{p_1} - \hat{p_2}
$$
The variance of the difference in proportions is:
$$
\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}
$$
A $100(1-\alpha)\%$ confidence interval for the difference in proportions is calculated as:
$$
\hat{p_1} - \hat{p_2} \pm z_{\alpha/2} \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}
$$
where
- $z_{\alpha/2}$: The critical value from the standard normal distribution.
- $\hat{p_1}$, $\hat{p_2}$: Sample proportions.
- $n_1$, $n_2$: Sample sizes.
**Sample Size for a Desired Confidence Level and Margin of Error**
To achieve a margin of error $d$ for a given confidence level, the required sample size can be estimated as follows:
1. **With Prior Estimates of** $\hat{p_1}$ and $\hat{p_2}$: $$
n \approx \frac{z_{\alpha/2}^2 \left[p_1(1-p_1) + p_2(1-p_2)\right]}{d^2}
$$
2. **Without Prior Estimates** (assuming maximum variability, $\hat{p} = 0.5$): $$
n \approx \frac{z_{\alpha/2}^2}{2d^2}
$$
**Hypothesis Testing for Difference in Proportions**
The test statistic for hypothesis testing depends on the null hypothesis:
1. **When** $(p_1 - p_2) \neq 0$: $$
z = \frac{(\hat{p_1} - \hat{p_2}) - (p_1 - p_2)_0}{\sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}}
$$
2. **When** $(p_1 - p_2)_0 = 0$ (testing equality of proportions): $$
z = \frac{\hat{p_1} - \hat{p_2}}{\sqrt{\hat{p}(1-\hat{p}) \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}
$$
where $\hat{p}$ is the pooled sample proportion:
$$
\hat{p} = \frac{x_1 + x_2}{n_1 + n_2} = \frac{n_1\hat{p_1} + n_2\hat{p_2}}{n_1 + n_2}
$$
------------------------------------------------------------------------
### For Single Proportion
The $100(1-\alpha)\%$ confidence interval for a population proportion $p$ is:
$$
\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}
$$
Sample Size Determination
- **With Prior Estimate** ($\hat{p}$): $$
n \approx \frac{z_{\alpha/2}^2 \hat{p}(1-\hat{p})}{d^2}
$$
- **Without Prior Estimate**: $$
n \approx \frac{z_{\alpha/2}^2}{4d^2}
$$
The test statistic for $H_0: p = p_0$ is:
$$
z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}
$$
------------------------------------------------------------------------
### For Single Variance
For a sample variance $s^2$ with $n$ observations, the $100(1-\alpha)\%$ confidence interval for the population variance $\sigma^2$ is:
$$
\begin{aligned}
1 - \alpha &= P( \chi_{1-\alpha/2;n-1}^2) \le (n-1)s^2/\sigma^2 \le \chi_{\alpha/2;n-1}^2)\\
&=P\left(\frac{(n-1)s^2}{\chi^2_{\alpha/2; n-1}} \leq \sigma^2 \leq \frac{(n-1)s^2}{\chi^2_{1-\alpha/2; n-1}}\right)
\end{aligned}
$$
Equivalently, the confidence interval can be written as:
$$
\left(\frac{(n-1)s^2}{\chi^2_{\alpha/2}}, \frac{(n-1)s^2}{\chi^2_{1-\alpha/2}}\right)
$$
To find confidence limits for $\sigma$, compute the square root of the interval bounds:
$$
\text{Confidence Interval for } \sigma: \quad \left(\sqrt{\frac{(n-1)s^2}{\chi^2_{\alpha/2}}}, \sqrt{\frac{(n-1)s^2}{\chi^2_{1-\alpha/2}}}\right)
$$
**Hypothesis Testing for Variance**
The test statistic for testing a null hypothesis about a population variance ($\sigma^2_0$) is:
$$
\chi^2 = \frac{(n-1)s^2}{\sigma^2_0}
$$
This test statistic follows a chi-squared distribution with $n-1$ degrees of freedom under the null hypothesis.
### Non-parametric Tests
+---------------------------------------------------------------+------------------------------+--------------------------------+
| **Method** | **Purpose** | **Assumptions** |
+===============================================================+==============================+================================+
| [Sign Test] | Test median | None (ordinal data sufficient) |
+---------------------------------------------------------------+------------------------------+--------------------------------+
| [Wilcoxon Signed Rank Test] | Test symmetry around a value | Symmetry of distribution |
+---------------------------------------------------------------+------------------------------+--------------------------------+
| [Wald-Wolfowitz Runs Test] | Test for randomness | Independent observations |
+---------------------------------------------------------------+------------------------------+--------------------------------+
| [Quantile (or Percentile) Test](#quantile-or-percentile-test) | Test specific quantile | None (ordinal data sufficient) |
+---------------------------------------------------------------+------------------------------+--------------------------------+
#### Sign Test
The **Sign Test** is used to test hypotheses about the median of a population, $\mu_{(0.5)}$, without assuming a specific distribution for the data. This test is ideal for small sample sizes or when normality assumptions are not met.
To test the population median, consider the hypotheses:
- Null Hypothesis: $H_0: \mu_{(0.5)} = 0$
- Alternative Hypothesis: $H_a: \mu_{(0.5)} > 0$ (one-sided test)
Steps:
1. **Count Positive and Negative Deviations**:
- Count observations ($y_i$) greater than 0: $s_+$ (number of positive signs).
- Count observations less than 0: $s_-$ (number of negative signs).
- $s_- = n - s_+$.
2. **Decision Rule**:
- Reject $H_0$ if $s_+$ is large (or equivalently, $s_-$ is small).
- To determine how large $s_+$ must be, use the distribution of $S_+$ under $H_0$, which is **Binomial** with $p = 0.5$.
3. **Null Distribution**:\
Under $H_0$, $S_+$ follows: $$
S_+ \sim Binomial(n, p = 0.5)
$$
4. **Critical Value**:\
Reject $H_0$ if: $$
s_+ \ge b_{n,\alpha}
$$ where $b_{n,\alpha}$ is the upper $\alpha$ critical value of the binomial distribution.
5. **p-value Calculation**:\
Compute the p-value for the observed (one-tailed) $s_+$ as: $$
\text{p-value} = P(S \ge s_+) = \sum_{i=s_+}^{n} \binom{n}{i} \left(\frac{1}{2}\right)^n
$$
Alternatively: $$
P(S \le s_-) = \sum_{i=0}^{s_-} \binom{n}{i} \left(\frac{1}{2}\right)^n
$$
------------------------------------------------------------------------
Large Sample Normal Approximation
For large $n$, use a normal approximation for the binomial test. Reject $H_0$ if: $$
s_+ \ge \frac{n}{2} + \frac{1}{2} + z_{\alpha} \sqrt{\frac{n}{4}}
$$ where $z_\alpha$ is the critical value for a one-sided test.
For two-sided tests, use the maximum or minimum of $s_+$ and $s_-$:
- Test statistic: $s_{\text{max}} = \max(s_+, s_-)$ or $s_{\text{min}} = \min(s_+, s_-)$
- Reject $H_0$ if $p$-value is less than $\alpha$, where: $$
p\text{-value} = 2 \sum_{i=s_{\text{max}}}^{n} \binom{n}{i} \left(\frac{1}{2}\right)^n = 2 \sum_{i = 0}^{s_{min}} \binom{n}{i} \left( \frac{1}{2} \right)^n
$$
Equivalently, rejecting $H_0$ if $s_{max} \ge b_{n,\alpha/2}$.
For large $n$, the normal approximation uses: $$
z = \frac{s_{\text{max}} - \frac{n}{2} - \frac{1}{2}}{\sqrt{\frac{n}{4}}}
$$\
Reject $H_0$ at $\alpha$ if $z \ge z_{\alpha/2}$.
Handling zeros in the data is a common issue with the Sign Test:
1. **Random Assignment**: Assign zeros randomly to either $s_+$ or $s_-$ (2 researchers might get different results).
2. **Fractional Assignment**: Count each zero as $0.5$ toward both $s_+$ and $s_-$ (but then we could not apply the [Binomial Distribution] afterward).
3. **Ignore Zeros**: Ignore zeros, but note this reduces the sample size and power.
```{r}
# Example Data
data <- c(0.76, 0.82, 0.80, 0.79, 1.06, 0.83, -0.43, -0.34, 3.34, 2.33)
# Count positive signs
s_plus <- sum(data > 0)
# Sample size excluding zeros
n <- length(data)
# Perform a one-sided binomial test
binom.test(s_plus, n, p = 0.5, alternative = "greater")
```
#### Wilcoxon Signed Rank Test
The **Wilcoxon Signed Rank Test** is an improvement over the [Sign Test] as it considers both the magnitude and direction of deviations from the null hypothesis value (e.g., 0). However, this test assumes that the data are symmetrically distributed around the median, unlike the Sign Test.
We test the following hypotheses:
$$
H_0: \mu_{(0.5)} = 0 \\
H_a: \mu_{(0.5)} > 0
$$
This example assumes no ties or duplicate observations in the data.
Procedure for the Signed Rank Test
1. **Rank the Absolute Values**: