-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
181 lines (136 loc) · 5.84 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
---
output: github_document
editor_options:
markdown:
wrap: 80
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
dpi = 300,
fig.width = 10,
fig.height = 7,
dev = "svg"
)
set.seed(1234)
# preload libraries here to avoid messages
library(mifa)
library(psych)
library(ggplot2)
library(tidyr)
ggplot2::theme_set(theme_minimal(base_size = 16))
```
# mifa - multiple imputation for factor analysis
<img src='man/figures/logo.png' align="right" height="139" />
<!-- badges: start -->
[![Lifecycle: maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)
[![CRAN status](https://www.r-pkg.org/badges/version/mifa)](https://CRAN.R-project.org/package=mifa)
[![R build status](https://github.com/teebusch/mifa/workflows/R-CMD-check/badge.svg)](https://github.com/teebusch/mifa/actions)
[![Codecov test coverage](https://codecov.io/gh/Teebusch/mifa/branch/master/graph/badge.svg)](https://codecov.io/gh/Teebusch/mifa?branch=master)
<!-- badges: end -->
`mifa` is an R package that implements multiple imputation of covariance
matrices to allow to perform factor analysis on incomplete data. It works as
follows:
1. Impute missing values multiple times using *Multivariate Imputation with
Chained Equations* (MICE) from the [mice](https://amices.org/mice/) package.
2. Combine the covariance matrices of the imputed data sets into a single
covariance matrix using Rubin's rules<sup>^[Rubin D. B. Multiple imputation
for nonresponse in surveys (2004). John Wiley & Sons.]</sup>
3. Use the combined covariance matrix for exploratory factor analysis.
`mifa` also provides two types of confidence intervals for the variance
explained by different numbers of principal components:
Fieller confidence intervals (parametric) for larger samples<sup>^[Fieller, E.
C. (1954). Some problems in interval estimation. Journal of the Royal
Statistical Society. Series B (Methodological): 175-185.]</sup>
and bootstrapped confidence intervals (nonparametric) for smaller
samples.<sup>^[Shao, J. & Sitter, R. R. (1996). Bootstrap for imputed survey
data. Journal of the American Statistical Association 91.435 (1996): 1278-1288. doi: [10.1080/01621459.1996.10476997](https://dx.doi.org/10.1080/01621459.1996.10476997)]</sup>
**For more information about the method, see:**
Nassiri, V., Lovik, A., Molenberghs, G., Verbeke, G. (2018). On using multiple
imputation for exploratory factor analysis of incomplete data. *Behavior
Research Methods* 50, 501–517. doi: [10.3758/s13428-017-1013-4](https://doi.org/10.3758/s13428-017-1013-4)
*Note:* The paper was accompanied by an implementation in R, and this package
emerged from it. The repository appears to have been abandoned by the authors,
but you can still find it [here](https://github.com/vahidnassiri/mifa).
## Installation
Install from CRAN with:
```{r eval=FALSE}
install.packages("mifa")
```
Or install the development version from [Github](https://github.com/teebusch/mifa) with:
``` r
# install.packages("devtools")
devtools::install_github("teebusch/mifa")
```
## Usage
### Example Data
For this example we use the `bfi` data set from the `psych` package. It contains
2,800 subjects' answers to 25 personality self-report items and 3 demographic
variables (sex, education, and age). Each of the 25 personality questions is
meant to tap into one of the "Big 5" personality factors, as indicated by their
names: **O**penness, **C**onscientiousness, **A**greeableness, ,
**E**xtraversion, **N**euroticism. There are missing responses for most items.
Instead of dropping the incomplete cases from the analysis, we will use `mifa`
to impute them, and then perform a factor analysis on the imputed covariance
matrix.
### Imputing the Covariance Matrix
First, we use `mifa()` to impute the covariance matrix and get an idea how many
factors we should use. We use the `cov_vars` argument to tell `mifa` to use
`gender`, `education`, and `age` for the imputations, but exclude them from the
covariance matrix:
```{r run-mifa, messages=FALSE}
library(mifa)
library(psych)
mi <- mifa(
data = bfi,
cov_vars = -c(gender, education, age),
n_pc = 2:8,
ci = "fieller",
print = FALSE
)
mi
```
### Factor Analysis
It looks like the first 5 principal components explain more than half of the
variance in the responses, so we perform a factor analysis with 5 factors, using
the `fa()` function from the `psych` package. We can get the imputed covariance
matrix of our data from `mi$cov_combined`. From there on, it's business as
usual.
```{r message=FALSE}
fit <- fa(mi$cov_combined, n.obs = nrow(bfi), nfactors = 5)
```
The factor diagram shows that the five factors correspond nicely to the 5 types
of questions:
```{r fa-diagram, fig.height = 8}
fa.diagram(fit)
```
We can add the factor scores to the original data, in order to explore group
differences. Because we need complete data to calculate factor scores, we first
impute a single data set with mice:
```{r}
data_imp <- mice::complete(mice::mice(bfi, 1, print = FALSE))
fct_scores <- data.frame(factor.scores(data_imp[, 1:25], fit)$scores)
data_imp <- data.frame(
Gender = factor(data_imp$gender),
Extraversion = fct_scores$MR1,
Neuroticism = fct_scores$MR2,
Conscientious = fct_scores$MR3,
Openness = fct_scores$MR4,
Agreeableness = fct_scores$MR5
)
levels(data_imp$Gender) <- c("Male", "Female")
```
Then we can visualize the group differences:
```{r fa-group-comparison}
library(ggplot2)
library(tidyr)
data_imp2 <- tidyr::pivot_longer(data_imp, -Gender, "factor")
ggplot(data_imp2) +
geom_density(aes(value, linetype = Gender)) +
facet_wrap(~ factor, nrow = 2) +
theme(legend.position = c(.9, .1))
```
## Further Reading