-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathsurvey-analysis.Rmd
executable file
·352 lines (286 loc) · 14.2 KB
/
survey-analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
---
title: "Survey analysis for When (ish) is My Bus? User-centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems"
output:
github_document:
toc: true
---
## Introduction
This document describes the analyses from our paper using R code and associated output. It is generated from [survey_analysis.Rmd](survey_analysis.Rmd).
Please cite:
Matthew Kay, Tara Kola, Jessica Hullman, Sean Munson. _When (ish) is My Bus?
User-centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems_.
CHI 2016. DOI: [10.1145/2858036.2858558](http://dx.doi.org/10.1145/2858036.2858558).
## Setup
### Required libraries
If you are missing any of the packages below, use `install.packages("packagename")` to install them.
The `import::` syntax requires the `import` package to be installed, and provides a simple way to
import specific functions from a package without polluting your entire namespace (unlike `library()`)
```{r setup, results="hide", message=FALSE}
library(Hmisc)
library(ggplot2)
import::from(scales, extended_breaks, format_format, math_format)
import::from(boot, logit, inv.logit)
import::from(grid, grid.draw)
import::from(RVAideMemoire, spearman.ci)
import::from(magrittr, `%>%`, `%<>%`, `%$%`)
import::from(dplyr,
transmute, group_by, mutate, filter, select,
left_join, summarise, one_of, arrange, do, ungroup)
import::from(gamlss, gamlss)
import::from(gamlss.dist, BE)
```
We also use the following libraries available from Github, which can be installed using `devtools::install_github`:
```{r setup_libraries, message=FALSE, warning=FALSE, cache=FALSE}
library(tidybayes) # compose_data, apply_prototypes, extract_samples, compare_levels
# to install, run devtools::install_github("mjskay/tidybayes")
library(metabayes) # metastan
# to install, run devtools::install_github("mjskay/metabayes")
```
### Ggplot theme
```{r setup_ggplot_theme}
theme_set(theme_light() + theme(
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
axis.line=element_line(color="black"),
text=element_text(size=14),
axis.text=element_text(size=rel(15/16)),
axis.ticks.length=unit(8, "points"),
line=element_line(size=.75)
))
```
### Memory limit
Finally, some of the manipulation of Bayesian model posteriors can take a bunch of memory:
```{r setup_memory_limit, warning=FALSE, results="hide"}
memory.limit(10000)
```
## Load and clean data
Data loading and cleaning is implemented in [R/load-survey-data.R](R/load-survey-data.R), which loads and defines a few data frames, notably `df` (raw data) and `dfl` (`df` transformed and cleaned up into long format).
```{r clean, results="hide"}
source("R/load-survey-data.R")
```
Since most of our analysis will be done with `dfl`, let's see its structure:
```{r}
str(dfl)
```
In most cases we'll be particularly interested in just a few columns, so let's see those:
```{r}
dfl %>%
select(participant, viz, known_p, p, restricted_p) %>%
head(10)
```
`participant` is a factor corresponding to the participant, `viz` is a factor corresponding to the visualization (`b20` is dotplot-20, `b100` is dotplot-100, `draws` is the stripeplot, and `fill` is the density plot), `known_p` is the true probability being estimated, `p` is the participant's estimate, and `restricted_p` is the same as `p` except `0` is mapped to `0.001` and `1` is mapped to `0.999`, because the regression model we will use requires outcomes between 0 and 1 (exclusive).
## Method
To assess the bias and variance in responses more systematically, we
fit a beta regression to participants' estimated probabilities. Beta regression
assumes responses are distributed according to a beta distribution, which is defined on (0, 1) and naturally
accounts for the fact that responses on a bounded interval have non-constant variance
(also known as _hetereoskedasticity_): as we approach the boundaries, responses tend to "bunch up".
We use a regression with a submodel for the mean (in logit-space) and
the dispersion (in log-space). This allows us to model the _bias_ of people's
estimates as effects on the mean of their responses, and the _variance_ of their
estimates as effects on the dispersion of their responses. Specifically, we
include _visualization_, _logit(correct probability)_, and their interaction as
fixed effects on mean response (we use logit(correct probability) instead of
the untransformed correct probability because the response is also estimated in logit space, thus
an unbiased observer would have an intercept of 0 and slope of 1 for logit(correct probability)). We include
_visualization_ and _layout_ as fixed effects on the dispersion (in other words,
some visualizations or layouts may be harder to use, resulting in more variable responses).
We also include _participant_ and _participant × visualization_ as random effects
(some people may be worse at this task, or worse at this task on specific visualizations),
and _question_ as a random effect (some questions may be harder).
We use a Bayesian model, which allows us to build on previous results by specifying prior information
for effects. We derive prior effects from fitting a similar model to the data from Hullman et al.,
which had a similar task (estimating cumulative probabilities on three visualizations: a violin
plot, animated hypothetical outcomes, and error bars). We set Gaussian priors for fixed effects
in our model that capture the sizes of effects seen in the HOPs data within 1-2 standard deviations,
with skeptical means (0 for intercept and 1 for slope in logit-logit space, corresponding to
an unbiased observer). We use the posterior estimate of the variance of the random effect of
participant in that model as the prior for the variance of random effects in our analysis.
(Footnote: note that similar results were obtained using more default priors, showing our results are not
highly sensitive to choice of priors here).
## Results
### Variance in respondents' probability estimates
As a first glance at understanding performance across conditions, we can look at
differences between the correct probability and the probability respondents gave
for each questions (in logit-logit space). Here is the density of those
differences, broken down by _visualization type_:
```{r variance, fig.width=6, fig.height=6}
dfl %>%
ggplot(aes(x = logit(restricted_p) - logit(known_p))) +
geom_vline(xintercept=0, linetype="dashed", color="#d62d0e", size=.75) +
stat_density() +
scale_x_continuous(breaks=extended_breaks(11),
labels = function(x) paste(format_format()(x), "\n", format_format()(round(inv.logit(x),2)))) +
coord_cartesian(xlim=c(-2.5,2.5)) +
facet_grid(viz~.)
```
We can see that bias (the difference between the provided answer and the correct
answer on average) is fairly low. Visually, we can also see that variance in the estimates
appears lower in the _dotplot-20_ visualization compared to the other visualizations.
### Beta regression model
Because the Bayesian model can take a little while to fit, the code for specifying and fitting the model can be found in [R/beta-regression-for-p~known_p.R](R/beta-regression-for-p~known_p.R), which saves its output to the `fit` object stored in [fits/fit-p~known_p.RData](fits/fit-p~known_p.RData).
```{r load_p_p_known_fit}
load("fits/fit-p~known_p.RData")
```
Since Stan doesn't know anything about factors, we use `apply_prototypes` to recover information like what indices correspond to which factors (e.g., vizualization types) in the model. This allows the `tidy_samples` function to correctly label coefficients when it extracts them from the model.
```{r}
fit %<>% apply_prototypes(dfl)
```
Using the beta regression model described above, we can estimate the dispersion associated
with each visualization. First, we extract the dispersion coefficients from the model:
```{r get_dispersion_coefs}
dispersion = tidy_samples(fit, d_viz[viz])
```
Then we'll set up some function for plotting:
```{r log_plot_functions}
#assuming data in natural log space, these give log2 breaks and log2 labels
log2_breaks = function(x) extended_breaks()(x/log(2)) * log(2)
log2_format = function(x) math_format(2^.x)(x/log(2))
#assuming dispersion coefficients in log space, this gives approximate standard deviation
#of a beta distribution with mean .5 having that dispersion
logdisp_to_sd.5 = function(x) sqrt(.25/(exp(-x) + 1))
logdisp_to_sd.5_format = function(x) format_format()(round(sqrt(.25/(exp(-x) + 1)), 2))
no_clip = function(p) {
#draw plot with no clipping
gt = ggplot_gtable(ggplot_build(p))
gt$layout$clip[gt$layout$name=="panel"] = "off"
grid.draw(gt)
}
logticks = function(x, base=2) {
#hackish logticks since ggplot's are broken with coord_flip
min_ = min(ceiling(x/log(base)))
max_ = max(floor(x/log(base))) - 1
data.frame(tick = unlist(lapply(min_:max_, function(x) {
log10(1:10*10^x)*log(base)
})))
}
```
And we'll plot the dispersion coefficients:
```{r dispersion_coefs, fig.width=6, fig.height=4}
#as dispersion
p = dispersion %>%
mutate(viz = reorder(viz, -d_viz)) %>% #nicer order for this plot
ggeye(aes(x = viz, y = d_viz)) +
geom_segment(aes(y=tick, yend=tick), data=logticks(dispersion$d_viz),
x=.3, xend=.4, color="darkgray"
) +
scale_y_continuous(
breaks = log2_breaks,
labels = function(x) paste(log2_format(x), "\n", logdisp_to_sd.5_format(x))
)
no_clip(p)
```
_Dotplot-20_ has the lowest estimated dispersion. However, these are a little difficult to interpret. So instead, we'll convert the dispersion into the predicted standard deviation of the response assuming the mean predicted response (`p`) is == .5:
```{r dispersion_sd, fig.width=6, fig.height=4}
#as sd at p = .5
dispersion %>%
mutate(
viz = reorder(viz, -d_viz), #nicer order for this plot
sd_viz = logdisp_to_sd.5(d_viz)
) %>%
ggeye(aes(x = viz, y = sd_viz)) +
stat_summary(aes(label=..y..), fun.y = function(x)
round(quantile(x, .025), 3), geom="text", vjust=-0.3, hjust=1, size=4
) +
stat_summary(aes(label=..y..), fun.y = function(x)
round(median(x), 3), geom="text", vjust=-0.6, hjust=0.5, size=4
) +
stat_summary(aes(label=..y..), fun.y = function(x)
round(quantile(x, .975), 3), geom="text", vjust=-0.3, hjust=0, size=4
)
```
_dotplot-20_ is around
2-3 percentage points better than _density_, and about 5-6 percentage points better than
_stripeplot_. We can estimate these differences on the dispersion scale:
```{r dispersion_coef_comp, fig.width=6, fig.height=5}
p = dispersion %>%
mutate(viz = relevel(viz, "b20")) %>% #nicer order for this plot
compare_levels(d_viz, by=viz) %>%
ggeye(aes(x = viz, y = d_viz)) +
geom_hline(linetype="dashed", yintercept=0) +
scale_y_continuous(breaks=log2_breaks, labels=log2_format) +
geom_segment(aes(y=tick, yend=tick), data=logticks(c(-1,2)), x=.25, xend=.4, color="darkgray")
no_clip(p)
```
As differences in points of sd at p = .5 (0.01 is one percentage point):
```{r dispersion_sd_diff, fig.width=6, fig.height=5}
dispersion %>%
mutate(
sd_viz = logdisp_to_sd.5(d_viz),
viz = relevel(viz, "b20") #nicer order for this plot
) %>%
compare_levels(sd_viz, by=viz) %>%
ggeye(aes(x = viz, y = sd_viz)) +
stat_summary(aes(label=..y..), fun.y = function(x)
round(quantile(x, .025), 3), geom="text", vjust=0, hjust=1
) +
stat_summary(aes(label=..y..), fun.y = function(x)
round(median(x), 3), geom="text", vjust=-0.3, hjust=0.5, size=4
) +
stat_summary(aes(label=..y..), fun.y = function(x)
round(quantile(x, .975), 3), geom="text", vjust=0, hjust=0
) +
geom_hline(linetype="dashed", yintercept=0)
```
Or as ratios of standard deviations at p = .5:
```{r dispersion_sd_ratio, fig.width=6, fig.height=5}
dispersion %>%
mutate(
sd_viz = logdisp_to_sd.5(d_viz),
viz = relevel(viz, "b20") #nicer order for this plot
) %>%
compare_levels(sd_viz, by=viz, fun=`/`) %>%
ggeye(aes(x = viz, y = sd_viz)) +
stat_summary(aes(label=..y..), fun.y = function(x)
round(quantile(x, .025), 2), geom="text", vjust=0, hjust=1
) +
stat_summary(aes(label=..y..), fun.y = function(x)
round(median(x), 2), geom="text", vjust=-0.3, hjust=0.5, size=4
) +
stat_summary(aes(label=..y..), fun.y = function(x)
round(quantile(x, .975), 2), geom="text", vjust=0, hjust=0
) +
geom_hline(linetype="dashed", yintercept=1)
```
_Dotplot-100_ performs similarly to _density_ in terms of
dispersion, which would be consistent with people mostly employing estimation of area in _dotplot-100_ (when
there are more dots than someone is willing to count).
```{r gender_disp, fig.width=6, fig.height=1.5}
d_gender = tidy_samples(fit, d_gender[])
p = d_gender %>%
ggeye(aes(x = "male - female", y = d_gender)) +
geom_hline(linetype="dashed", yintercept=0) +
geom_segment(aes(y=tick, yend=tick), data=logticks(c(-1,1)), x=.35, xend=.5, color="darkgray") +
scale_y_continuous(breaks=log2_breaks, labels=log2_format, limits=c(-1,1))
no_clip(p)
```
Gender differences are likely not due to differences in statistical experience, as the distribution of statistical experience in each gender is very similar:
```{r}
gender_stats = xtabs(~ statistics_experience + gender, data=df)
gender_stats %>% cbind(round(prop.table(., 2), 2))
chisq.test(gender_stats)
```
Route-layout may also have been a little harder for people to use (slightly less precise), but not by much (if at all):
```{r layout_disp, fig.width=6, fig.height=1.5}
d_layout = tidy_samples(fit, d_layout[])
p = d_layout %>%
ggeye(aes(x = "route - bus", y = d_layout)) +
geom_hline(linetype="dashed", yintercept=0) +
geom_segment(aes(y=tick, yend=tick), data=logticks(c(-1,1)), x=.35, xend=.5, color="darkgray") +
scale_y_continuous(breaks=log2_breaks, labels=log2_format, limits=c(-1,1))
no_clip(p)
```
### Confidence
Correlation between confidence and error:
```{r conf_corr}
dfl %>%
group_by(viz) %>%
do({
ct = with(.,
spearman.ci(abs(logit(restricted_p) - logit(known_p)),
restricted_confidence, nrep=10000
)
)
data.frame(lower=ct$conf.int[[1]], cor=ct$estimate, upper=ct$conf.int[[2]])
})
```