-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathWhy model.qmd
51 lines (29 loc) · 1.23 KB
/
Why model.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# Why model
## Data cannot speak for themselves
![](images/clipboard-1882184516.png)
## Parametric estimators of the conditional mean
当分层过多时,某一个暴露下没有足够的值,或者直接就没有数值,此时就需要借助模型
```{r}
A3<-c(3,11,17,23,29,37,41,53,67,79,83,97,60,71,15,45)
Y3<-c(21,54,33,101,85,65,157,120,111,200,140,220,230,217,11,190)
plot(Y3~A3)
abline(glm(Y3~A3), col = "blue", lwd = 2)
glm(Y3~A3)
predict(glm(Y3~A3), data.frame(A3=90))
```
## Nonparametric estimators of the conditional mean
![](images/clipboard-1444979883.png)
![](images/clipboard-2459163703.png)
估计的条件均值的个数等于参数的个数就是饱和模型,估计的条件均值的个数大于参数的个数就是不饱和和模型
后面的 估计都是二分类暴露 — 饱和模型 — 条件均值的非参数估计
## Smoothing
加高维项
```{r}
Asq<-A3*A3
glm(Y3~A3+Asq)
predict(glm(Y3~A3+Asq), data.frame(cbind(A3=90, Asq=8100)))
```
## The bias-variance trade-off
不加二次项的模型 bias大但是variance小
加二次项的模型bias小但是variance大(置信区间大)
如何抉择根据实际情况判断,高次项的模型相比于一次项的模型,模型错误指定的可能更小