-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
177 lines (138 loc) · 5.41 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
output: github_document
bibliography: ["readme.bib"]
---
```{r rmd-setup, include = FALSE}
library(prefio)
knitr::opts_chunk$set(fig.path = "man/figures/")
```
# [prefio](https://fleverest.github.io/prefio/)
<!-- badges: start -->
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/prefio)](https://cran.r-project.org/package=prefio)
[![R-CMD-check](https://github.com/fleverest/prefio/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/fleverest/prefio/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/fleverest/prefio/branch/main/graph/badge.svg)](https://app.codecov.io/gh/fleverest/prefio?branch=main)
<!-- badges: end -->
## Overview
Ordinal Preference datasets are used by many research communities including, but not limited to, those who work with recommender systems, computational social choice, voting systems and combinatorial optimization.
The **prefio** R package provides a set of functions which enable users to perform a wide range of preference analysis tasks, including preference aggregation, pairwise comparison summaries and convenient IO operations. This makes it easier for researchers and other professionals to perform common data analysis and preprocessing tasks with such datasets.
## Installation
The package may be installed from CRAN via
```{r, eval = FALSE}
install.packages("prefio")
```
The development version can be installed via
```{r, eval = FALSE}
# install.packages("remotes")
remotes::install_github("fleverest/prefio")
```
## Usage
**prefio** provides a convenient interface for processing data from tabular
formats as well as sourcing data from one of the unified
[PrefLib formats](https://www.preflib.org/format/), including a convenient
method for downloading data files directly from PrefLib to your R session.
#### Processing tabular data
Preference data can come in many forms. Commonly preference data will be
either represented in either *long*-format with each row corresponding to a
particular *ranking* chosen for a single *item*:, e.g:
```{r echo = FALSE, results = 'asis'}
long <- data.frame(
ID = rep(1:3, each = 3),
ItemName = LETTERS[rep(1:3, 3)],
Rank = c(1, 2, 3, 3, 2, 1, 2, 1, 3)
)
knitr::kable(long,
caption = "Three orderings on items {A, B, C} in long-format."
)
```
This data can be converted from a `data.frame` into a `preferences` object:
```{r}
long <- data.frame(
ID = rep(1:3, each = 3),
ItemName = LETTERS[rep(1:3, 3)],
Rank = c(1, 2, 3, 3, 2, 1, 2, 1, 3)
)
prefs <- preferences(long,
format = "long",
id = "ID",
item = "ItemName",
rank = "Rank"
)
print(prefs)
```
Another way of tabulating orderings is with each unique ordering on a single
row, with each column representing the rank given to a particular item:
```{r echo = FALSE, results = 'asis'}
rankings <- matrix(
c(
1, 2, 3,
3, 2, 1,
2, 1, 3
),
nrow = 3,
byrow = TRUE
)
colnames(rankings) <- LETTERS[1:3]
knitr::kable(rankings,
caption = paste0(
"Three orderings on items ",
"{A, B, C} in a \"rankings\" format."
)
)
```
This data can be converted from a `data.frame` into a `preferences` object:
```{r}
rankings <- matrix(
c(
1, 2, 3,
3, 2, 1,
2, 1, 3
),
nrow = 3,
byrow = TRUE
)
colnames(rankings) <- LETTERS[1:3]
prefs <- preferences(rankings,
format = "ranking"
)
print(prefs)
```
#### Reading from PrefLib
The [Netflix Prize](https://en.wikipedia.org/wiki/Netflix_Prize) was a
competition devised by Netflix to improve the accuracy of its recommendation
system. To facilitate this they released ratings about movies from the users of
the system that have been transformed to preference data and are available from
[PrefLib](https://www.preflib.org/data/ED/00004/), [@Bennett2007]. Each data set
comprises rankings of a set of 3 or 4 movies selected at random. Here we
consider rankings for just one set of movies to illustrate the functionality of
**prefio**.
PrefLib datafiles such as these can be downloaded on-the-fly by specifying the
argument `from_preflib = TRUE` in the `read_preflib` function:
```{r}
netflix <- read_preflib("netflix/00004-00000138.soc", from_preflib = TRUE)
head(netflix)
```
Each row corresponds to a unique ordering of the four movies in the dataset.
The number of Netflix users that assigned each ordering is given in the
`frequencies` column. In this case, the most common ordering (with 68 voters
specifying the same preferences) is the following:
```{r}
print(netflix$preferences[1], width = 100)
```
#### Writing to Preflib formats
**prefio** provides a convenient interface for writing preferential datasets to
PrefLib formats. To aid the user, the `preferences()` function automatically
calculates metrics of the dataset which are required for producing valid PrefLib
files. For example, we can write our `prefs` from earlier:
```{r}
write_preflib(prefs)
```
Note that this produces four warnings. Each warning corresponds to a field which
is required by the official PrefLib format, but may not be necessary for
internal use-cases. If your goal is to publish some data to PrefLib, these
warnings must be resolved.
## Projects using **prefio**
The [New South Wales Legislative Assembly Election Dataset](https://github.com/fleverest/nswla_preflib)
uses **prefio** to process the public election datasets into PrefLib formats.
The R package [elections.dtree](https://github.com/fleverest/elections.dtree) uses **prefio** for tracking
ballots observed by the Dirichlet-tree model.
## References