-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathF-Roads_Data.Rmd
269 lines (176 loc) · 13.6 KB
/
F-Roads_Data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
---
output:
html_document:
toc: TRUE
toc_float: TRUE
---
# Appendix F: Roads Data {-}
## Overview
Transportation system is a significant source of air pollution, and roads are without doubt the most important component of transportation system. The ability to manipulate roads data is therefore of critical importance. In this tutorial, we will take a look at the major roads in several large Midwest counties, and we will introduce other techniques, such as how to handle data with physical units, along the way. Our objectives are to:
* perform spatial data manipulation using common functions such as `st_intersection` and `st_join`
* learn to work with data of type "units"
* summarize data by groups and output results
## Environment Setup {-}
### Input/Output {-}
Our input includes the shapefile of several large counties in the Midwest, a grid data file, and the shapefile of the major roads in Illinois, Indiana, Wisconsin, and Michigan. The files are available for download on this project's [GitHub repo](https://github.com/GeoDaCenter/OpenAirQ-toolkit/tree/master/data).
Our output is a shapefile that records the lengths of primary raods, secondary roads, and motorways in each 1 km \times 1 km grid.
### Load Libraries {-}
We start by loading the following packages:
* `raster`: to manipulate and analyze raster data
* `tidyverse`: to perform simple statistical analyses
* `sf`: to conduct basic spatial data manipulation
* `velox`: to manipulate raster data in time efficient manner
* `units`: to handle numeric values with physical measurement units
It should be mentioned here that the "velox" package, which is great at fast raster data manipulation, is sadly no longer available. Detailed instructions for installing this package from R archive are included in other parts of this tutorial book and can also be found [here](https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages).
```{r F.library, message = F, warning = F}
library(raster)
library(tidyverse)
library(sf)
library(velox)
library(units)
```
### Load Data {-}
With the packages ready, we now load the data needed for this tutorial - `roads` is the shapefile for the major roads in Illinois, Indiana, Wisconsin, and Michigan, `km.grid` is the grid data, and `counties` is the shapefile for 21 large counties in the Midwest. All of the data can be loaded using the `st_read` function.
```{r F.load.data}
roads <- st_read("./data/4StateMajorRoads")
km.grid <- st_read("./data/Km_Grid")
counties <- st_read("./data/LargeAreaCounties")
```
### Data Preparation {-}
The roads data we use in this tutorial were prepared in advance. For educational purposes, we will show here how to download the roads data from the OpenStreetMap(OSM) project. There are of course a large number of methods to download the data, and each has its own merits. However, if we would like to accomplish the task within R, then the `osmdata` is going to be your first choice.
```{r F.library2, message = F}
library(osmdata)
```
There are numerous types of features contained in the OSM project which can be easily downloaded. If you are not sure what features are available for use, you can check it by calling the function `available_features()`.
```{r F.check.features}
available_features()
```
The feature in which we are interested in is "highway." However, it should be pointed out that within each feature, there might be multiple tags. To see them, we call another function, `available_tags()`.
```{r F.check.tags}
available_tags("highway")
```
For our study and for this tutorial, we will focus on six of the tages, namely "primary," "primary_link," "secondary," "secondary_link," "motorway," "motorway_link." However, there are so many interesting tags out there for you to explore in your spare time. How exciting!
Our next step is to build a query. This can be done using the code below. `opq()` is the main function that will build the query, where you can also specify the geographical region of interest. For illustration purpose, we will use the state of Illinois as an example. Pay attention to the "timeout" argument. It is optional but is very helpful. When this argument is not specified, a timeout error may be raised if the query takes too long. `add_osm_feature()`, does exactly what its name suggests - it allows you to filter the features as well as the tags that you are interested in. The "features" correspond to the "key" argument, and the "tags" correspond to the "value" argument. Lastly, `osmdata_xml` allows you to store the downloaded data in the `xml` format.
```{r F.query, eval = F}
opq(bbox = 'Illinois', timeout = 100) %>%
add_osm_feature(key = 'highway',
value = c('primary', 'primary_link', 'secondary', 'secondary_link', 'motorway', 'motorway_link')) %>%
osmdata_xml(filename = 'data/roads_IL.osm')
```
If we would like to conduct spatial data analysis in R, it is easier to work with simple feature object. We can read the roads data, which are lines object, by using the `st_read()` function.
```{r F.read.roads, eval = F}
r_IL <- sf::st_read('data/roads_IL.osm', layer = 'lines')
```
We will not be using this data file in our tutorial; we will instead use `roads`, which we have loaded earlier.
## Data Manipulation {-}
### Prelimnary Work {-}
First, we create a new field in `km.grid` called `DATA`. This field is defined as a sequence of positive integers. Basically, we are assigning a unqie ID number of each features in `km.grid`. It might be puzzling at this time what function the `DATA` field serves, but it will make sense later when we perform spatial joins.
```{r F.DATA.seq}
km.grid$DATA <- seq(1:length(km.grid$DATA))
```
Our next step is to take the spatial intersection of `roads` and `km.grid`. To accomplish this, we simply call the `st_intersection` function, which does exactly what its name suggests.
```{r F.intersection, warning = F, message = F}
roads.intersection <- st_intersection(roads, km.grid)
```
To get the length of the roads, we use the `st_length` function. The geometry of raods is of the type `lingstring`, whose length is easily computed by the `st_length`function. We then store the output (i.e. the lengths of roads) in the `sf` object `roads.intersection` by creating a new field called `length`.
```{r F.length, warning = F, message = F}
roads.intersection$length <- st_length(roads.intersection)
```
Now, let's pause for a second and inspect the `DATA` field of `roads.intersection`. Recall that we created the `DATA` field as an identifier of the grid.
Other than that the values of `DATA` is right-skewed, the summary statistics do not tell us much useful information. However, it is nevertheless a good habit to inspect your data from time to time so that you are aware of what data you are working with.
```{r F.summary.roads}
summary(roads.intersection$DATA)
```
### Primary Roads {-}
Next, we want to take a look at all roads that are in the class "primary" or "primary_link." More technically, we would like to extract all features of `roads.intersection` that has value "primary" or "primary_link" in the field `fclass`. To achieve this, we take advantage of the `which` function, which returns the positions of all features that satisfy the criterion. Then we simply bond `pri.1` and `pri.2` into a new vector, and this new vector `pri.id` has the positions of all the features that we want.
```{r F.primary.selection}
pri.1 <- which(roads.intersection$fclass == "primary")
pri.2 <- which(roads.intersection$fclass == "primary_link")
pri.id <- c(pri.1, pri.2)
```
To get the roads that are classified as either "primary" or primary_link", we simply use the `pri.id` we defined above to subset `roads.intersection`. It should be noted that there are many ways to extract the features that we want from `roads.intersection`. The method we use here is one of the most straightforward ones, but there are many alternatives, some of which are potentially better than the one shown here. In most cases, deciding which method to use is simply a matter of personal preference.
```{r F.subset.primary}
primary.int <- roads.intersection[pri.id,]
```
It is now a time to do a spatial join, which can be done using the `st_join` function. In plain English, we basically combine the `sf` objects `km.grid` and `primary.int` into a new `sf` object called `primary.merged`.
```{r F.primary.merge, warning = F, message = F}
primary.merged <- st_join(km.grid, primary.int)
```
Now, let's take a look at `primary`. This is just an `sf` object with 14 fields and 68769 features.
```{r F.primary.inspect}
primary.merged
```
The next step is to aggregate the lengths of the roads. Recall that we created the `DATA` field in `km.grid` at the beginning of this tutorial as a unique identifier of the grid. Now, we group the data by `DATA.x` (notice the name of the field changed after the spatial join), and we summarize the data using the `sum` function. We store the result of the this summation into the field `length`.
```{r F.primary.summarize, warning = F, message = F}
primary.roadlengths <- primary.merged %>%
group_by(DATA.x) %>%
summarize(length = sum(length))
```
What we have obtained above is the total length of "primary" and "primary_link" roads in each grid. The new `sf` object, grouped by grids, is named `primary.roadlengths`. Let's take a quick look at what information it contains.
```{r F.primary.result, message = F, warning =F}
primary.roadlengths
```
We see that only certain features has numeric values in the field `length`, which makes perfect sense as roads only cross a few grids. It would be quite strange if we see every feature as a large numeric value in the field `length` - that would suggest that the entire land is covered with roads, which is the case in the real world.
The next thing we want to deal with is the unit. Note that the values in the field `length` is of the type "units" i.e. a numeric value with a physical measurement unit. The original unit of `length` is a little complicated. We will simplify things by just assigning it the unit "meter".
```{r F.units.1}
units(primary.roadlengths$length) <- with(ud_units, m)
```
What we want to do is to replace all missing values in `length` with "0 meter." To achieve that, we first create a variable with value 0 and then assign it the unit "meter." This newly created `x0` is what will replace the missing values.
```{r F.units.2}
x0 <- 0
units(x0) <- with(ud_units, m)
```
We use the `which` function to return the positions of all missing values, and on those exact position, we replace the value with `x0`. Hence, all missing values now read "0 meter."
```{r F.units.3}
primary.roadlengths[which(is.na(primary.roadlengths$length)),2] <- x0
```
As always, it is critical that we inspect our data from time to time. If we print out `primary.roadlengths`, we see that all missing values have indeed by replaced by "0 meter."
```{r F.units.4}
primary.roadlengths
```
### Secondary Roads {-}
From this point on, things start to get a little repetitive. We, in this section, repeat lots of code from the last section. The only difference is that we now want to extract the information - basically the lengths of the roads - of all roads that are classified as "secondary" or "secondary_link." For illustrative purposes, we show all the code below, but we shall not comment much on the code as it is almost identical to the code in the last section.
```{r warning = F, message = F}
sec.1 <- which(roads.intersection$fclass == "secondary")
sec.2 <- which(roads.intersection$fclass == "secondary_link")
sec.id <- c(sec.1, sec.2)
secondary.int <- roads.intersection[sec.id,]
secondary.merged <- st_join(km.grid, secondary.int)
secondary.roadlengths = secondary.merged %>%
group_by(DATA.x) %>%
summarize(length = sum(length))
secondary.roadlengths[which(is.na(secondary.roadlengths$length)),2] <- x0
```
We inspect `secondary.roadlengths`, and it looks fine in terms of both data format and data value.
```{r F.secondary.inspect}
secondary.roadlengths
```
### Motorway {-}
The code in this section is exactly the same as in the last two section. We include the code here for the sake of completeness, but you should feel free to skip this section.
```{r warning = F, message = F}
mot.1 <- which(roads.intersection$fclass == "motorway")
mot.2 <- which(roads.intersection$fclass == "motorway_link")
mot.id <- c(mot.1, mot.2)
motorway.int <- roads.intersection[mot.id,]
motorway.merged <- st_join(km.grid, motorway.int)
motorway.roadlengths = motorway.merged %>%
group_by(DATA.x) %>%
summarize(length = sum(length))
motorway.roadlengths[which(is.na(motorway.roadlengths$length)),2] <- x0
```
Before we proceed, we take a galnce at `motorway.roadlengths`.
```{r F.motorway.inspect}
motorway.roadlengths
```
## Outputting Data {-}
We now enter the last section of this tutorial, where we would like to save all the work we have done so far. This is not difficult - we just create three fields in `km.grid` (`primary`, `secondary`, and `motorway`), where we store the lengths of the roads.
```{r F.store.length}
km.grid$primary <- primary.roadlengths$length
km.grid$secondary <- secondary.roadlengths$length
km.grid$motorway <- motorway.roadlengths$length
```
Then, we output a new shapefile with the modified `km.grid` using the `st_write` function.
```{r F.write, eval = F}
st_write(km.grid, "Road_Grid.shp")
```
This concludes our tutorial. In this exercise, we showed how roads data can be handled. Of course, some of the techniques introduced here can be applied in many other instances. The brief introduction on dealing with data of the type "units" should be particularly helpful if you have not been previously exposed to this data type. You should feel free to explore more on your own and apply what you have learned to your own research.