This repository has been archived by the owner on Dec 26, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
222 lines (173 loc) · 11 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# compstatr <img src="man/figures/logo.png" align="right" />
[![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)
[![Travis-CI Build Status](https://travis-ci.com/slu-openGIS/compstatr.svg?branch=master)](https://travis-ci.com/slu-openGIS/compstatr)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/slu-openGIS/compstatr?branch=master&svg=true)](https://ci.appveyor.com/project/chris-prener/compstatr)
[![Coverage status](https://codecov.io/gh/slu-openGIS/compstatr/branch/master/graph/badge.svg)](https://codecov.io/github/slu-openGIS/compstatr?branch=master)
[![CRAN_status_badge](http://www.r-pkg.org/badges/version/compstatr)](https://cran.r-project.org/package=compstatr)
[![cran checks](https://cranchecks.info/badges/worst/compstatr)](https://cran.r-project.org/web/checks/check_results_compstatr.html)
[![DOI](https://zenodo.org/badge/105331568.svg)](https://zenodo.org/badge/latestdoi/105331568)
The goal of `compstatr` is to provide a suite of tools for working with crime data made public by the City of St. Louis' [Metropolitan Police Department](http://www.slmpd.org).
## Motivation
Among cities in the United States, St. Louis has the distinction of having the highest or one of the highest violent crime and homicide rates since 2010. It therefore presents an important site for social scientists, public health researchers, and health care providers as well as policy makers to understand the effects of violent crime on urban communities.
The City's crime data, however, are difficult to work with and present a number of challenges for researchers. These data are inconsistently organized, with all data before 2013 and some months of 2013 itself having eighteen variables. Beginning during 2013, most (but not all) months have twenty variables, many of which are named differently from their pre-2014 counterparts. These inconsistencies, and the fact that working with their data requires managing over 120 spreadsheets that each download with with a `.html` file extension, are the motivating force behind `compstatr`.
We therefore provide a set of tools for accessing, preparing, editing, and mapping St. Louis [Metropolitan Police Department](http://www.slmpd.org) (SLMPD) crime data, which are available [on their website](http://www.slmpd.org/Crimereports.shtml) as `.csv` files. The categorization tools that are provided will work with any police department that uses 5 and 6 digit numeric codes to identify specific crimes.
## What's New in v0.2.2.9000?
SLMPD is currently switching to NBIRS, and has not published data since December 2020. We have not been given a timeline for data availability. This current version of the package contains checks in `cs_get_data()` that will gracefully error if you attempt to download data that is listed in the `index` object but are not actually available on the website.
## Installation
The easiest way to get `compstatr` is to install it from CRAN:
``` r
install.packages("compstatr")
```
The development version of `compstatr` can be accessed from GitHub with `remotes`:
```r
# install.packages("remotes")
remotes::install_github("slu-openGIS/compstatr")
```
## Usage
We'll start with loading the `compstatr` package:
```r
> library(compstatr)
```
### Data Access - Read Tables Directly into R
As of version `v0.2.0`, data tables can be scraped and read directly into `R` without manually downloading them first. They are read from the St. Louis Metropolitan Police Department's [website](http://www.slmpd.org/Crimereports.shtml) and imported directly as objects in `R`'s global environment. To identify the last available month:
```r
> cs_last_update()
[1] "May 2019"
```
To enable scraping, an index of the available data needs to be created. Doing this is optional but highly recommended to improve performance:
```r
> # create index
> i <- cs_create_index()
```
This index is used by `cs_get_data()` to find the requested table or tables, post a request via the SLMPD website's form system, and then download your data:
```r
> # download single month
> may17 <- cs_get_data(year = 2017, month = "May", index = i)
>
> # download full year
> yearList17 <- cs_get_data(year = 2017, index = i)
```
Once data are downloaded, they need to be validated and standardized before proceeding with analysis.
### Data Access - Use Tables Downloaded Manually
While scraping is now an option, St. Louis data can still be downloaded month-by-month from [SLMPD](http://www.slmpd.org/Crimereports.shtml). `compstatr` assumes that only one year of crime data (or less) is included in specific folders within your project. These next examples assume you have downloaded all of the data for 2017 and 2018, and saved them respectively in `data/raw/2017` and `data/raw/2018`.
The function `cs_prep_data()` can be used to rename files, which may be downloaded with the wrong file extension (`January2018.csv.html`). Once downloaded you can load them into what we call year-list objects:
```r
> cs_prep_year(path = "data/raw/2017")
>
> yearList17 <- cs_load_year(path = "data/raw/2017")
```
Once data are downloaded, they need to be validated and standardized before proceeding with analysis.
### Data Preparation
Both the data downloaded manually as well as the tables scraped from SLMPD's website are inconsistently organized. Problems that need to be addressed prior to collapsing a year-list into a single object can be identified with `cs_validate()`:
```r
> cs_validate(yearList17, year = 2017)
[1] FALSE
```
If a `FALSE` value is returned, the `vebose = TRUE` argument provides additional detail:
```r
> cs_validate(yearList17, year = 2017, verbose = TRUE)
# A tibble: 12 x 8
namedMonth codedMonth valMonth codedYear valYear oneMonth varCount valVars
<chr> <chr> <lgl> <int> <lgl> <lgl> <lgl> <lgl>
1 January January TRUE 2017 TRUE TRUE TRUE TRUE
2 February February TRUE 2017 TRUE TRUE TRUE TRUE
3 March March TRUE 2017 TRUE TRUE TRUE TRUE
4 April April TRUE 2017 TRUE TRUE TRUE TRUE
5 May May TRUE 2017 TRUE TRUE FALSE NA
6 June June TRUE 2017 TRUE TRUE TRUE TRUE
7 July July TRUE 2017 TRUE TRUE TRUE TRUE
8 August August TRUE 2017 TRUE TRUE TRUE TRUE
9 September September TRUE 2017 TRUE TRUE TRUE TRUE
10 October October TRUE 2017 TRUE TRUE TRUE TRUE
11 November November TRUE 2017 TRUE TRUE TRUE TRUE
12 December December TRUE 2017 TRUE TRUE TRUE TRUE
```
In this case, we have the wrong number of variables for the month of May (in this case there are 26). We can fix this by using `cs_standardize()` to create the correct number of columns (20) and name them appropriately:
```r
> # standardize
yearList17 <- cs_standardize(yearList17, month = "May", config = 26)
>
> # confirm data are now valid
> cs_validate(yearList17, year = 2017)
[1] TRUE
```
For 2013 and prior years, there will be only 18 variables. The 2013 data need to be fixed month by month because there are some correct months, but years 2008 through 2012 can be fixed en masse:
```r
> yearList08 <- cs_standardize(yearList08, config = 18, month = "all")
```
Once the data have been standardized, we can collapse them into a single object with `cs_collapse()`:
```r
> reports17 <- cs_collapse(yearList17)
```
This gives us all of the crimes reported in 2017. However, there will be crimes that were reported that year that occurred in prior years, and there may also be crimes reported in 2018 that took place in our year of interest. We can address both issues (assuming we have the next year's data) with `cs_combine()`:
```r
> # load and standardize 2018 data
> cs_prep_year(path = "data/raw/2018")
> yearList18 <- cs_load_year(path = "data/raw/2018")
> cs_validate(yearList18, year = 2018)
[1] TRUE
> reports18 <- cs_collapse(yearList18)
>
> # combine 2017 and 2018 data
crimes17 <- cs_combine(type = "year", date = 2017, reports17, reports18)
```
We now have a tibble containing all of the known crimes that occurred in 2017 (including those reported in 2018).
### Data Wrangling and Mapping
Once we have the data prepared, we can easily pull out a specific set of crimes to inspect further. For example, we could identify homicides. In the next few examples, we'll use the `january2018` example data that comes with the package. We'll start by using `cs_filter_crimes()` to select only homicides as well as `cs_filter_count()` to remove any unfounded incidents:
```r
> # load dependencies
> library(compstatr)
> library(ggplot2)
> library(magrittr)
> library(mapview)
>
> # subset homicides and removed unfounded incidents
> janHomicides <- january2018 %>%
+ cs_filter_count(var = Count) %>%
+ cs_filter_crime(var = Crime, crime = "homicide")
```
Next, we'll check for missing spatial data with `cs_missingXY()`:
```r
> # identify missing spatial data
> janHomicides <- cs_missingXY(janHomicides, varX = XCoord, varY = YCoord, newVar = missing)
>
> # check for any TRUE values
> table(janHomicides$missing)
```
We don't have any missing spatial data in this example, but if we did we would need to remove those observations with `dplyr::filter()` (or another subsetting tool). Finally, we can project and map our data:
```r
> # project data
> janHomicides_sf <- cs_projectXY(janHomicides, varX = XCoord, varY = YCoord)
>
> # preview data
> mapview(janHomicides_sf)
```
```{r exampleMap1, echo=FALSE, out.width = '100%'}
knitr::include_graphics("man/figures/homicide_mapview.png")
```
These data can also be mapped using `ggplot2` once they have been projected:
```r
> library(ggplot2)
> ggplot() +
+ geom_sf(data = janHomicides_sf, color = "red", fill = NA, size = .5)
```
```{r exampleMap2, echo=FALSE, out.width = '40%'}
knitr::include_graphics("man/figures/homicide_map.png")
```
## Non St. Louis Data
If you work with data from other police departments, the `cs_crime()`, `cs_crime_cat()`, and `cs_filter_crime()` functions may be useful for identifying, grouping, and subsetting by crime so long as they use a standard set of 5 and 6 digit codes based on the UCR system (e.g. `31111` (robbery with a firearm) or `142320` (malicious destruction of property)).
## Acknowledgements
We wish to thank Taylor Braswell for his significant efforts compiling Stata code early in this project. Taylor's code was used as a reference when developing this package, and many of the functions reflect issues that he worked to identify.
## Contributor Code of Conduct
Please note that this project is released with a [Contributor Code of Conduct](https://github.com/slu-openGIS/compstatr/blob/master/.github/CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms.