-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathspatial-patterns.qmd
937 lines (725 loc) · 39.1 KB
/
spatial-patterns.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
# Spatial patterns
## Aims
This session aims to provide an illustration on how to (1) analyse the *spatial* patterns of origin-destination mobility flow data extracted from Meta-Facebook; (2) compute basic area-based indicators of human mobility; (3) handle spatial datasets in R; and, (4) create geospatial visualisations to examine and effectively communicate human mobility patterns. We start by clearing our R environment by running:
```{r}
#clean environment
rm(list=ls())
```
## Dependencies
We ensure to load the libraries below. A core area of this session is learning to work with spatial data in R. R offers an ecosystem of purposely designed packages for manipulation of spatial data and spatial analysis techniques. These ecosystem is known a [r-spatial](https://r-spatial.org). Various packages exist in [CRAN (The Comprehensive R Archive Network)](https://cran.r-project.org), including **sf** [@sf2018; @R-sf], **stars** [@R-stars], **terra**, **s2** [@R-s2], **lwgeom** [@R-lwgeom], **gstat** [@pebesma2004; @R-gstat], **spdep** [@R-spdep], **spatialreg** [@R-spatialreg], **spatstat** [@baddeley2015spatial; @R-spatstat], **tmap** [@tmap2018; @R-tmap], **mapview** [@R-mapview] and more. A key package is this ecosystem is **sf** [@pebesma2023spatial]. R package **sf** provides a table format for simple features, where feature geometries are stored in a list-column.It appears in 2016 and was developed to move spatial data analysis in R closer to standards-based approaches seen in the industry and open source projects, to build upon more modern versions of open source geospatial software stack and allow for integration of R spatial software with the **tidyverse** (Wickham et al. 2019), particularly **ggplot2**, **dplyr**, and **tidyr**.
```{r}
#| warning: false
#| message: false
# data wrangling
library(tidyverse)
# spatial data wrangling
library(sf)
# data visualisation
library(viridis)
# format data visualisations
library(ggthemes)
library(patchwork)
library(showtext)
library(scales)
# create maps
library(leaflet)
library(tmap)
library(mapdeck)
```
## Data
Here we read all the data needed for the analysis. We use two types of data: (1) human mobility derived from Meta-Facebook users; and, administrative boundary data for Chile.
### Meta-Facebook mobility data
We use origin-destination mobility flow data between Provinces in Chile. We use data for April 2020. For a detailed description of the Meta-Facebook mobility data, please see the *Meta-Facebook data introduction*. We start by reading the data. We filter only flows occurring within the boundaries of Chile. The dataset contains daily flow counts between provinces that occurred in April 2020 during three windows of time during the day; that is, between 12am, 8am and 4pm.
We have a look at the data frame. We can see that the data contains 20 columns and 29,491 origin-destination interactions capturing counts of movements between provinces.
```{r}
# read
df20 <- readRDS("./data/fb/movement_adm/2020_04.rds") %>%
dplyr::filter(country == "CL")
glimpse(df20)
```
We can identify the list of origin and destination provinces for which we can observe movement.
```{r}
unique_origins <- unique(df20$start_polygon_name)
unique_destinations <- unique(df20$end_polygon_name)
```
### Meta-Facebook active users population
We will also use information on the number of Meta-Facebook active users population. The population Meta-Facebook active users can vary over time reflecting their varying patterns of usage and internet accessibility.
```{r}
# read and select observations
pop20_df <- readRDS("./data/fb/population_adm/2020_04.rds") %>%
dplyr::filter(country == "CL")
# identify polygons
unique_areas <- unique(pop20_df$polygon_name)
# data overview
glimpse(pop20_df)
```
### Bing tiles
Meta-Facebook data captured during the COVID-19 outbreak were available into two formats, at the level of [Bing tiles](https://learn.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system) and at the level of administrative areas. While we will be using data at the provincial level, you may be interested in having a peek at the tiles. The Bing Maps Tile System was developed by Microsoft. This system defines a series of grids at different resolution levels over a rectangular projection of the world, comprising 23 different levels of detail [@schwartz2009bing]. Each level is constructed by dividing the previous level into fourths, with the most granular level being Level 1. Meta-Facebook data are typically produced at Bing tile levels 13 through 16, where level 13 results in tiles that are about 4.9 x 4.9 km at the Equator.
::: callout-note
In our work in the project *RECAST*, we use this tile system and analyse the spatial patterns of human mobility in four Latin American countries, including Argentina, Chile, Colombia and Mexico. We expect to be in a position to share our final results with you soon. If you are interested, please do get in touch. For now, we share some of our share using Meta-Facebook data: @rowe2022, and research on the same line: @gonzález-leonardo2022, @gonzález-leonardo2022a, @gonzález-leonardo2022b, @wang2022 and @rowe2023.
:::
We now read the shapefile containing the Bing tiles, simplify their boundaries and confirm it is valid. We display the boundaries for Chile and selected regions in the north, central and south of the country, to provide a better understanding of their size and areas of coverage.
```{r}
# read bing tiles
bing_grid <- read_sf("./data/shp/grid/chile_grid.shp") %>%
st_simplify(preserveTopology =T, # simplify boundaries
dTolerance = 1000) %>% # 1km
sf::st_make_valid() # check the geometry is valid
```
```{r}
#| echo: false
p_country <- ggplot(data = bing_grid) +
geom_sf(color = "gray60",
size = 0.1) +
theme_void() +
ggtitle('Chile')
p_north <- ggplot(data = bing_grid %>% tail(100) ) +
geom_sf(color = "gray60",
size = 0.1) +
theme_void() +
ggtitle('North')
p_central <- ggplot(data = bing_grid[2500:3000,] ) +
geom_sf(color = "gray60",
size = 0.1) +
theme_void() +
ggtitle('Central')
p_south <- ggplot(data = bing_grid %>% head(1500) ) +
geom_sf(color = "gray60",
size = 0.1) +
theme_void() +
ggtitle('South')
p_country | (p_north / p_central / p_south)
rm(p_country, p_north, p_central, p_south, bing_grid)
```
::: callout-note
We created the Bing tiles. They are not available as polygons. They are only provided as rasters and not easily accessible.
:::
### Administrative areas
We now read the boundaries for Chilean provinces. Provinces are the second administrative level in the country. Provinces are amalgamations of municipalities or comunes, and groupings of provinces are known as regions. Chile is organised around 15 regions, *54 provinces* and 346 municipalities - see [here](https://www.subdere.gov.cl/sites/default/files/documentos/articles-73111_recurso_1.pdf).
Let's stop here and understand the spatial data frame or **sf** object we are reading. We can see it has 56 features (i.e. rows) and 5 fields (columns) within a bounding box which defines the area we can visualise on a map. You can see how the map of provinces below seems off to the right. That is because the bounding box has been set to include Chilean islands off of the coast of the country on the Pacific ocean. We will work on adjusting this at a later point in this session.
The line *CRS* or Coordinate Reference Systems identifies the projection system currently attached to the data. This would be the CRS that will be used if we decided to map the data. The component is incredible important if you intend to combine information from two spatial data frames. Ensure they are on the same CRS! A good idea is to used planar projection systems. @lovelace2019geocomputation provide a good discussion on the various projection systems.
```{r}
#| warning: false
shp_pro <- read_sf("./data/shp/adm/province/PROVINCIAS_2020.shp") %>%
st_simplify(preserveTopology =T,
dTolerance = 1000) %>% # 1km
sf::st_make_valid()
shp_pro
```
```{r}
#| echo: false
ggplot(data = shp_pro) +
geom_sf(color = "gray60",
size = 0.1) +
theme_void() +
ggtitle('Provinces')
```
We will also use the regional boundaries for visualisation purposes. For now, we will just read them.
```{r}
#| warning: false
shp_reg <- read_sf("./data/shp/adm/region/REGIONES_2020.shp") %>%
st_simplify(preserveTopology =T,
dTolerance = 1000) %>% # 1km
sf::st_make_valid()
```
## Spatial indicators of human mobility
This section focuses on computing various spatial indicators to analyse the patterns of human mobility. A key message to get across here is that digital footprint data do not capture the entire population living and moving in the country. We can capture signals or systematic trends, but there will be considerable amount of fluctuation of volatility. Such volatiblity may partly reflect the short-term variability in the patterns of human mobility and may also echo noise in the way in which the data are capture or process. Hence, we recommend to use indices and these indices should comprise aggregations of data in meaningful ways, based on geographical or temporal units.
### Origin-based indicators
We start by computing indicators from the perspective of origin areas. We generate a set of five indicators capturing average and cumulative patterns at the monthly level. We compute the following indicators:
- **mean_perchange:** is the average percent change in the count of movements from a given origin in relation to the baseline period over a month. It is computed as the average of the ratio of the number of movements from an area at time period *t*, minus the number of movements from an area during the baseline pre-pandemic period over the number of pmovements from an area during the baseline period.
- **mean_diff_flow:** is the difference in the count of movements from a given origin in relation to the baseline period over a month. It is computed as the difference between the number of movements from an area at time period *t*, minus the number of movements from an area during the baseline pre-pandemic period.
- **sum_diff_flow:** is the sum of the difference in the count of movements from a given origin in relation to the baseline period. It is computed as the sum of the difference between the number of movements from an area at time period *t*, minus the number of movements from an area during the baseline pre-pandemic period.
- **mean_outflow:** is the average count of movements from a given origin at a give time period over a month.
- **sum_outflow:** is the sum of the count of movements from a given origin at a give time period over a month.
```{r}
origin_df <- df20 %>%
filter(start_polygon_name != end_polygon_name) %>%
group_by(start_polygon_name) %>%
dplyr::summarise(
mean_perchange = mean(percent_change, na.rm = T),
mean_diff_flow = mean(n_difference, na.rm = T),
sum_diff_flow = sum(n_difference, na.rm = T),
mean_outflow = mean(n_crisis, na.rm = T),
sum_outflow = sum(n_crisis, na.rm = T)
) %>%
ungroup()
tail(origin_df, 10)
```
We observe the observations at the tail of the data frame. We observe that Santiago recorded an average reduction of over 70% in the count of movements from Santiago in April 2020, in relation to the baseline. Yet, the average size of this reduction in outflows seems to have been relatively small, involving 792 and reporting an average outflow of 612 movements.
::: {.callout-note icon="false" appearance="simple"}
## Question 1
Analyse the origin-based indicators and interpret the patterns displayed for Valparaíso.
:::
### Destination-based indicators
Mobility involves the spatial interaction between two areas. Mobility impacts both areas of origins and destinations [@rowe2022a]. We often seek to understand both perspectives. Here we compute similar indicators to those calculated for the areas of origins. What interesting patterns can you see?
::: {.callout-note icon="false" appearance="simple"}
## Question 2
Analyse the designation-based indicators and interpret the patterns displayed for Santiago.
:::
```{r}
destination_df <- df20 %>%
filter(start_polygon_name != end_polygon_name) %>%
group_by(end_polygon_name) %>%
dplyr::summarise(
mean_perchange = mean(percent_change, na.rm = T),
mean_diff_flow = mean(n_difference, na.rm = T),
sum_diff_flow = sum(n_difference, na.rm = T),
mean_inflow = mean(n_crisis, na.rm = T),
sum_inflow = sum(n_crisis, na.rm = T)
) %>%
ungroup()
tail(destination_df, 10)
```
### Intraflows
Meta-Facebook mobility data also allow us to capture short-distance, local movements occurring within areas. Below we compute indicators to measure the extent of movement within Chilean provinces, adopting similar indices to those described above.
A key pattern is an increase in the number of movements within provinces in April 2020. This is likely to have been in response to COVID-19 restrictions constraining the amount of movement over long distances. If people moved, they mostly moved locally to go shopping, parks or for short walks. Yet, we also observe that some provinces report a decline.
::: {.callout-note icon="false" appearance="simple"}
## Question 3
Why do you think those areas experienced a decline in local mobility in April 2020?
:::
```{r}
origin_df <- df20 %>%
filter(start_polygon_name == end_polygon_name) %>%
group_by(start_polygon_name) %>%
dplyr::summarise(
mean_perchange = mean(percent_change, na.rm = T),
mean_diff_flow = mean(n_difference, na.rm = T),
sum_diff_flow = sum(n_difference, na.rm = T),
mean_intraflow = mean(n_crisis, na.rm = T),
sum_intraflow = sum(n_crisis, na.rm = T)
) %>%
ungroup()
tail(origin_df, 10)
```
### Netflows
An additional key dimension of mobility is its net balance; that is, the extent to which human mobility acts to redistribute population across areas of the country. While some areas report population gains, other experience decline.
During the early stages of COVID-19, photo evidence of empty cities around the world poured over the Internet. Anecdotal evidence suggested the occurrence of an urban exodus, particularly from large cities as people sought more and greener spaces [see some our work on COVID and mobility, e.g. @rowe2022]. At the same time, they sought to avoid high density areas, and they realised the need for bigger housing areas as they become multi-functional spaces for work, study and leisure for entire family units. People were said to have moved to sparsely populated rural areas, suburban spaces and more attractive migration destinations.
Below we compute a measure of the net balance of mobility flows between areas of origin and destination. We use the average number of movements into an areas, minus the number of movements out of that area. We have a look at the head of the data frame and observe that the average net mobility balance is virtually zero. However, at this point, we should recognise that while indicators are useful to measure different dimensions of human mobility, data visualisations are key to identify systematic patterns in the data.
```{r}
# mean outflow by area
outflows_df <- df20 %>%
filter(start_polygon_name != end_polygon_name) %>%
group_by(start_polygon_name) %>%
dplyr::summarise(
mean_outflow = mean(n_crisis, na.rm = T)
) %>%
ungroup()
# mean inflow by area
inflows_df <- df20 %>%
filter(start_polygon_name != end_polygon_name) %>%
group_by(end_polygon_name) %>%
dplyr::summarise(
mean_inflow = mean(n_crisis, na.rm = T)
) %>%
ungroup()
# combine data frames
netflow_df <- cbind(inflows_df, outflows_df)
# mean netflow by area
netflow_df <- netflow_df %>%
mutate(
mean_netflow = mean_inflow - mean_outflow
) %>%
select(start_polygon_name, end_polygon_name, mean_inflow, mean_outflow, mean_netflow)
head(netflow_df)
```
Generally a useful graphical representation of the data is histograms or kernel density histograms. They are helpful to easily have an understanding of the distribution of mobility indicators. The histogram below reveals that while most areas report an average net mobility balance of zero, some areas display losses or gains of over 10 or 20 movements, respectively.
::: callout-note
**ggplot** is a primary tool for data visualisation in R. **ggplot** has a basic structure of three components:
- The data: *ggplot( data =* data frame*)*
- Geometries: *geom_xxx( . )*
- Aesthetic mapping: *aes(x=*variable*, y=*variable*)*
Then you can work on the cosmetics of the figure such as changing the *theme*, *axis*, *labels*, etc.
:::
```{r}
#| warning: false
ggplot(data = netflow_df) + # input the data
geom_density(aes(x = mean_netflow), # specify type of geom and aesthetics
alpha=0.5,
colour="darkblue",
linewidth = 2
) +
theme_tufte() + # choose the theme
theme(axis.text.y = element_text(size = 12), # edit labels format
axis.text.x = element_text(size = 12),
axis.title=element_text(size=14)
) +
labs(y = "Density", # label axis
x = "Average net movement count")
```
## Mapping
Once you get a basic understanding of the data, you may want to have a more insightful representation of mobility indicators and add some context. Geospatial visualisations are great for these purposes, and for this end, cartography is key. A carefully crafted map can be an effective way of communicating complex information. Design issues include poor placement, size and readability of text and careless selection of colors. Have a look the [style guide](https://files.taylorandfrancis.com/TJOM-suppmaterial-quick-guide.pdf) of the Journal of Maps for details, and also at the use of colours for effective communication. We recommend this piece by @crameri2020 and @maceachren1997.
::: {.callout-note icon="false" appearance="simple"}
## Question 4
Why to create geovisualisations?
:::
::: callout-note
For colour palettes, we recommend the following resources:
- the R packages `viridis` and `RColorBrewer`
- the website [color brewer 2.0](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3)
- a publication by @crameri2020
:::
### Handling spatial data
But, before we get to map some data, there are some key elements we need to cover first. We will show you how to handle spatial data in R. As we indicated earlier in this section, **sf** is versatile R package to handle spatial data frames in R. A first key thing you need to know about spatial data is that they always have attached a CRS and ensuring there is a commonly shared CRS across your data is essential for mapping. We will be using a planar system, as opposed to a spherical system. We set our default CRS.
```{r}
# set crs
crs_default = "EPSG:4326"
```
To ensure you get a full understanding of the process of mapping, we will roll back and start from the beginning. We will re-read the data and re-compute the indicators, but this time we will convert our movement dataset into a spatial data frame by using the latitude and longitude of the origins as coordinates. This information will be used to convert our movement data frame into a spatial data frame using the latitude and longitude of the origins as geometry.
```{r}
df20 <- readRDS("./data/fb/movement_adm/2020_04.rds") %>%
mutate(GEOMETRY = NULL) %>%
dplyr::filter(country == "CL") %>%
st_as_sf(coords = c("start_lon", "start_lat"),
crs = crs_default)
glimpse(df20)
```
We will also re-read our province boundary data and ensure it is on the same CRS.
```{r}
#| warning: false
shp_pro <- read_sf("./data/shp/adm/province/PROVINCIAS_2020.shp") %>%
st_simplify(preserveTopology =T,
dTolerance = 1000) %>% # 1km
sf::st_make_valid() %>%
st_transform(crs_default)
```
Once we have read our input data, we will re-compute a few of mobility indicators.
```{r}
# mean outflow by area
outflows_df <- df20 %>%
filter(start_polygon_name != end_polygon_name) %>%
group_by(start_polygon_name) %>%
dplyr::summarise(
mean_outflow = mean(n_crisis, na.rm = T)
) %>%
ungroup()
# mean inflow by area
inflows_df <- df20 %>%
filter(start_polygon_name != end_polygon_name) %>%
group_by(end_polygon_name) %>%
dplyr::summarise(
mean_inflow = mean(n_crisis, na.rm = T)
) %>%
ungroup() %>%
st_drop_geometry()
# combine data frames
netflow_df <- cbind(outflows_df, inflows_df)
# mean netflow by area
netflow_df <- netflow_df %>%
mutate(
mean_netflow = mean_inflow - mean_outflow
) %>%
dplyr::select(start_polygon_name, mean_inflow, mean_outflow, mean_netflow, geometry) %>%
rename(
polygon_name = start_polygon_name
)
head(netflow_df)
```
Now let's start mapping our data. Let's do this by stage so you understand the key building blocks of this process. We first drawn the polygons representing the provinces of Chile.
```{r}
#| warning: false
p <- ggplot() +
geom_sf(data = shp_pro,
color = "gray60",
size = 0.1)
last_plot()
```
Let's add the centroids of the origins based on our movement data.
```{r}
#| warning: false
p <- p +
geom_point(data = netflow_df,
aes(geometry = geometry),
stat = "sf_coordinates"
)
last_plot()
```
Then we can remove the axes or background here:
```{r}
#| warning: false
p + theme_void()
```
The map is placed towards the right. We can centre the map by adjusting the bounding box of the shapefile. We can do this by obtaining the current bounding box and adjusting the x left limit.
```{r}
#| warning: false
bbox_new <- st_bbox(shp_pro) # current bounding box
xrange <- bbox_new$xmax - bbox_new$xmin # range of x values
yrange <- bbox_new$ymax - bbox_new$ymin # range of y values
bbox_new[1] <- bbox_new[1] + (0.6 * xrange) # xmin - left
#bbox_new[3] <- bbox_new[3] + (0.5 * xrange) # xmax - right
#bbox_new[2] <- bbox_new[2] - (0.5 * yrange) # ymin - bottom
#bbox_new[4] <- bbox_new[4] + (0.5 * yrange) # ymax - top
bbox_new <- bbox_new %>% # take the bounding box ...
st_as_sfc() # ... and make it a sf polygon
ggplot() +
geom_sf(data = shp_pro,
color = "gray60",
size = 0.1) +
geom_point(data = netflow_df,
aes(geometry = geometry),
stat = "sf_coordinates",
size = .1
) +
coord_sf(xlim = st_coordinates(bbox_new)[c(1,2),1], # min & max of x values
ylim = st_coordinates(bbox_new)[c(2,3),2]) + # min & max of y values
theme_void()
```
Something to note at this point is that two of the centroids lie outside the provincial polygons. This is difficult to visualise from the current map. We will find out this in a different way. We run a spatial join of the provincial polygons and our movement data frame, and check the provinces which were not matched. Below we can observe the result of this process. The provinces of Magallanes and Valparaíso were unmatched. That is because their centroids lie outside the polygon area. In the case of Valparaiso, islands attached to this province force the centroid to be position on the Pacific. In the case of Magallanes, the unusual shape of the province surrounded by fjords forces the centroid to be on the Pacific as well.
```{r}
# spatial join
mob_indicators <- st_join(shp_pro, netflow_df)
# check if all polygons are matched
netflow_df$check <- netflow_df$polygon_name %in% mob_indicators$polygon_name
# identify unmatched polygons
netflow_df %>% select(polygon_name, check) %>%
filter(check == "FALSE")
```
We therefore adjust these centriods and force them to be within the polygons of provinces.
```{r}
# get coordinates
coordinates <- st_coordinates(netflow_df)
# combine data frames
netflow_df <- cbind(netflow_df, coordinates) %>%
rename(
long = X,
lat = Y
)
# list province to be adjusted
province_name <- c("Valparaíso", "Magallanes")
# adjust the centroid
for (area in 1:2) {
long <- netflow_df %>%
st_drop_geometry() %>%
dplyr::filter(polygon_name == province_name[area]) %>%
select(long) %>%
as.numeric()
lat <- netflow_df %>%
st_drop_geometry() %>%
dplyr::filter(polygon_name == province_name[area]) %>%
select(lat) %>%
as.numeric()
st_geometry(netflow_df[netflow_df$polygon_name == province_name[area], ]) <- st_sfc(st_point(c( long * 0.98, lat * 1 )))
}
```
We re-join the polygon boundary data and our movement data frame and check the results. We can now see that all spatial units were matched.
```{r}
mob_indicators <- st_join(shp_pro, netflow_df,
st_intersects)
netflow_df$check <- netflow_df$polygon_name %in% mob_indicators$polygon_name
netflow_df %>% select(polygon_name, check) %>%
filter(check == "FALSE")
```
So far, we have only wrangled the data so we can start out mapping and spatial analysis.
### Choropleths
We first try choropleths. Choropleths are thematic maps. They are easy to create but also to get wrong. We will look at a set of the principles you can follow to create effective choropleth maps. Here three more questions to consider:
- What is being plotted?
- What is the target audience?
- What degree of interactivity we want to offer?
We will create maps of netflows, inflows and outflows. We will divide the data into categories. Categorising the data often facilites the identification of spatial patterns. There are different forms of organising these categories using quantiles, natural break points or more sophisticated clustering approaches. We will keep this simple and use seven categories. Feel free to explore the impacts on the map as these categories change.
```{r}
# set categories
mob_indicators <- mob_indicators %>%
mutate(
netflow_class = mean_netflow %>% cut(7, dig.lab = 3),
inflow_class = mean_inflow %>% cut(7, dig.lab = 3),
outflow_class = mean_outflow %>% cut(7, dig.lab = 3)
)
# adjust labels for netflows
netflow_labels <- levels(mob_indicators$netflow_class)
netflow_labels <- gsub("\\(|\\]", "", netflow_labels)
levels(mob_indicators$netflow_class) <- netflow_labels
# adjust labels for inflows
inflow_labels <- levels(mob_indicators$inflow_class)
inflow_labels <- gsub("\\(|\\]", "", inflow_labels)
levels(mob_indicators$inflow_class) <- inflow_labels
# adjust labels for netflows
outflow_labels <- levels(mob_indicators$outflow_class)
outflow_labels <- gsub("\\(|\\]", "", outflow_labels)
levels(mob_indicators$outflow_class) <- outflow_labels
# change geometry
shp_reg <- shp_reg %>% st_transform(crs_default)
```
Below we use **ggplot** to create our maps.
```{r}
#| warning: false
# map netflows
netflow_plot <- ggplot(data = mob_indicators, aes(fill = netflow_class)) +
geom_sf(col = "white", size = .1) +
coord_sf() +
scale_fill_brewer(palette = "RdBu", direction = -1) +
scale_color_manual(labels = netflow_labels) +
theme_map() +
theme(plot.title = element_text(size = 22, face = "bold"),
legend.position = "none") +
labs(title = "(a) Netflow",
fill = "Net count of moves") +
theme_void() +
geom_sf(data = shp_reg,
col = "grey70",
size = .5,
fill = "transparent") +
coord_sf(xlim = st_coordinates(bbox_new)[c(1,2),1],
ylim = st_coordinates(bbox_new)[c(2,3),2])
# map inflows
inflow_plot <- ggplot(data = mob_indicators, aes(fill = inflow_class)) +
geom_sf(col = "white", size = .1) +
coord_sf() +
scale_fill_brewer(palette = "PuRd", direction = 1) +
theme_map() +
theme(plot.title = element_text(size = 22, face = "bold"),
legend.position = "none") +
labs(title = "(b) Inflow",
fill = "Count of moves") +
theme_void() +
geom_sf(data = shp_reg,
col = "grey70",
size = .5,
fill = "transparent") +
coord_sf(xlim = st_coordinates(bbox_new)[c(1,2),1],
ylim = st_coordinates(bbox_new)[c(2,3),2])
# map outflows
outflow_plot <- ggplot(data = mob_indicators, aes(fill = outflow_class)) +
geom_sf(col = "white", size = .1) +
coord_sf() +
scale_fill_brewer(palette = "PuBu", direction = 1) +
theme_map() +
theme(plot.title = element_text(size = 22, face = "bold"),
legend.position = "none") +
labs(title = "(b) Outflow",
fill = "Count of moves") +
theme_void() +
geom_sf(data = shp_reg,
col = "grey70",
size = .5,
fill = "transparent") +
coord_sf(xlim = st_coordinates(bbox_new)[c(1,2),1],
ylim = st_coordinates(bbox_new)[c(2,3),2])
# combine plots
netflow_plot + inflow_plot + outflow_plot
```
The maps display the average count of movement across provinces in Chile. In relative terms, they indicate that some of central provinces experienced larger losses compared to northern and southern provinces. Central provinces also display the largest average counts of movement. This is expected as these areas concentrate most of the Chilean population, and larger population centres are expected to generate more movement.
**Flow to Facebook population size**
When using digital footprint data, a key concern is whether the data are biased or representative of the local populations [@rowe2023a]. In the context of mobility, for instance, we would expect that we would observe more movement where we observe a larger number of Meta-Facebook active users. To explore this, we could analyse the ratio of movement over the number of Meta-Facebook active users by area. High spatial variability in this ratio would be evidence of biases in the Meta-Facebook data capturing mobility. We compute the average number of outflow movement over the average Meta-Facebook active user population by province. The results are quite revealing, showing remarkable little variation across provinces. This suggests very little evidence of biases in the Meta-Facebook data to capture mobility. Of course, this evidence only applies for Chile. But this analysis could be easily implemented on data for other countries.
```{r}
#| warning: false
# compute mean population
mean_fb_pop <- pop20_df %>% group_by(polygon_name) %>%
dplyr::summarise(
mean_pop = mean(n_crisis, na.rm = T)
) %>%
ungroup()
# join to mobility indicators data frame
mob_indicators <- left_join(mob_indicators, mean_fb_pop,
by = c("polygon_name" = "polygon_name")) %>%
mutate(
outflow_to_pop = (mean_outflow / mean_pop) # compute outflow to population
)
# map outflow_to_pop
ggplot(data = mob_indicators, aes(fill = outflow_to_pop)) +
geom_sf(col = "white", size = .1) +
coord_sf() +
scale_fill_viridis_c() +
theme_map() +
theme(plot.title = element_text(size = 22, face = "bold"),
legend.position = "none") +
labs(title = "Flows to population size",
fill = "Number of movements per 100") +
theme_void() +
geom_sf(data = shp_reg,
col = "grey70",
size = .1,
fill = "transparent") +
coord_sf(xlim = st_coordinates(bbox_new)[c(1,2),1],
ylim = st_coordinates(bbox_new)[c(2,3),2])
```
**Changes over time**
Maps are also an effective way of analysing spatial patterns over time. Below we show us how you can analyse the average count of net movement for April 2020 and April 2022 using **ggplot**. Different from the maps above, here we use a continous scale to map the data. We can see slight variations in the net balance of movement.
::: {.callout-note icon="false" appearance="simple"}
## Question 5
What interesting patterns do you see?
:::
```{r}
#| warning: false
# read April 2022 data
df22 <- readRDS("./data/fb/movement_adm/2022_04.rds") %>%
mutate(GEOMETRY = NULL) %>%
dplyr::filter(country == "CL") %>%
st_as_sf(coords = c("start_lon", "start_lat"),
crs = crs_default)
# mean outflow by area
outflows_df <- df22 %>%
filter(start_polygon_name != end_polygon_name) %>%
group_by(start_polygon_name) %>%
dplyr::summarise(
mean_outflow = mean(n_crisis, na.rm = T)
) %>%
ungroup()
# mean inflow by area
inflows_df <- df22 %>%
filter(start_polygon_name != end_polygon_name) %>%
group_by(end_polygon_name) %>%
dplyr::summarise(
mean_inflow = mean(n_crisis, na.rm = T)
) %>%
ungroup() %>%
st_drop_geometry()
# combine data frames
netflow_df <- cbind(outflows_df, inflows_df)
# mean netflow by area
netflow_df <- netflow_df %>%
mutate(
mean_netflow = mean_inflow - mean_outflow
) %>%
dplyr::select(start_polygon_name, mean_inflow, mean_outflow, mean_netflow, geometry) %>%
rename(
polygon_name = start_polygon_name
)
# extract coordinates
coordinates <- st_coordinates(netflow_df)
# add coordinates
netflow_df <- cbind(netflow_df, coordinates) %>%
rename(
long = X,
lat = Y
)
# list province names
province_name <- c("Valparaíso", "Magallanes")
# loop to replace point geometries
for (area in 1:2) {
long <- netflow_df %>%
st_drop_geometry() %>%
dplyr::filter(polygon_name == province_name[area]) %>%
select(long) %>%
as.numeric()
lat <- netflow_df %>%
st_drop_geometry() %>%
dplyr::filter(polygon_name == province_name[area]) %>%
select(lat) %>%
as.numeric()
st_geometry(netflow_df[netflow_df$polygon_name == province_name[area], ]) <- st_sfc(st_point(c( long * 0.98, lat * 1 )))
}
# combine indicators with province polygons
mob_indicators22 <- st_join(shp_pro, netflow_df,
st_intersects)
# add year to data frame
mob_indicators$year <- "2020"
mob_indicators22$year <- "2022"
# combine data frames for 2020 and 2022
mob_indicators <- mob_indicators %>% select(names(mob_indicators22)) # remove columns to make data frames compatible
mob_indicators_20.22 <- rbind(mob_indicators, mob_indicators22)
```
::: callout-note
We note that *facet_grid(.)* is a useful function to know if you intend to produce figures by layers and these layers are defined by a different variable / column in your data frame. In our example, this is year.
:::
```{r}
#| warning: false
# map netflows
ggplot(data = mob_indicators_20.22, aes(fill = mean_netflow)) +
geom_sf(col = "white", size = .1) +
coord_sf() +
scale_fill_gradient2(
low = muted("blue"),
mid = "white",
high = "red",
midpoint = 0,
space = "Lab",
na.value = "white",
guide = "colourbar",
aesthetics = "fill"
) +
theme_map() +
facet_grid(cols = vars(year)) +
theme(plot.title = element_text(size = 22, face = "bold"),
legend.position = "none") +
labs(title = "Netflow",
fill = "Number of movements") +
theme_void() +
geom_sf(data = shp_reg,
col = "grey70",
size = .5,
fill = "transparent") +
coord_sf(xlim = st_coordinates(bbox_new)[c(1,2),1],
ylim = st_coordinates(bbox_new)[c(2,3),2])
```
### Interactive mapping
An alternative way to analyse the data is using interactive maps. Interactive maps provide greater flexibility to explore the data by adding a base map for greater context. We use **tmap** for this task. **tmap** is a great package for mapping and you should probably add it to your toolbox if you are working on geographic information systems using R.
We map the average net count of movement for April 2022. We first enable interactivity by running `tmap_mode("view")` and then create the map. Using the menu on the left top of the map below, we can zoom in and out to explore the map and also change the base layer. We can also display or hide the colouring layers of our indicator. This allows to see the names of the areas experiencing net gains and losses of moves.
```{r}
#| warning: false
# create a map
tmap_mode("view") # enable interactivity
tm_shape(mob_indicators22) + # input data
tm_fill("mean_netflow", # draw and fill polygons
palette = "RdBu",
title = "Netflows")
```
::: callout-note
We use **ttm()** to switch off the interactive functionality of tmap.
:::
```{r}
# switch to other mode: "view"
ttm()
```
### Flow mapping
An additional key dimension of the analysis of origin-destination flow data is visually understanding the relationship between origins and destinations. This helps understanding the extent of interaction of individual areas with the rest of the country, and also informs how changes in a specific area may impact others - see for example @rowe2020 and @rowe2022b. Flow maps are often used to visualise the complex network of mobility flows between areas. A recently developed tool for this end is [mapbox](https://www.mapbox.com). If you want to use mapbox you need to sign up. You can create an account for free to use some basic products. We can use mapbox through the [mapdeck](https://cran.r-project.org/web/packages/mapdeck/vignettes/mapdeck.html) package in R.
So you have a start-to-end understanding of the process, we will begin from loading the input movement data. We compute the average count by origin-destination pair for April 2020, and prepare a data frame containing origin and destination coordinates.
```{r}
#| warning: false
#| message: false
# read data
df20 <- readRDS("./data/fb/movement_adm/2020_04.rds") %>%
mutate(GEOMETRY = NULL) %>%
dplyr::filter(country == "CL")
# compute mean move by origin-destination pair
flow_df20 <- df20 %>%
filter(start_polygon_name != end_polygon_name) %>%
group_by(start_polygon_name, end_polygon_name) %>%
dplyr::summarise(
mean_flow = mean(n_crisis, na.rm = T)
) %>%
ungroup()
# create a coordinate data frame for origins
origin_coordinate_df <- df20 %>%
dplyr::select( c(start_polygon_name, start_lat, start_lon)) %>%
distinct()
# create a coordinate data frame for destinations
destination_coordinate_df <- df20 %>%
dplyr::select( c(end_polygon_name, end_lat, end_lon)) %>%
distinct()
# join coordinates for origins and destinations
flow_df20 <- left_join(flow_df20, origin_coordinate_df, by = c("start_polygon_name" = "start_polygon_name"))
flow_df20 <- left_join(flow_df20, destination_coordinate_df, by = c("end_polygon_name" = "end_polygon_name"))
```
For our illustration, we will focus on the province of Santiago.
```{r}
df_santiago <- flow_df20 %>% dplyr::filter(start_polygon_name == "Santiago")
head(df_santiago)
```
```{r}
#| echo: false
my_key <- "pk.eyJ1IjoiZmNvcm93ZSIsImEiOiJja2Jyc2Qxd2QyNngwMndwOTVxa3B2bjBpIn0.vA3y1-WhmI_W3NxCKlILzw"
```
To use mapbox, you will need to sign up for an account on the [mapbox](https://www.mapbox.com) website. For this workshop, we have generated a key to share but we strongly suggest to create an account as the key will last for a certain period of time. The map generated is interactive. If you can explore it by clicking on it and moving your mouse. If you can also zoom in and out by double-clicking.
A striking pattern of the map below is the high degree of connectivity of Santiago with the rest of the country. Flows from Santiago not only extend to proximate and adjacent provinces but extend to far provinces in the north and south of the country.
```{r}
#| warning: false
key <- my_key ## put your own token here
flowmap <- mapdeck( token = key, style = mapdeck_style("dark"),
location = c(-3.7, 40.4), zoom = 6, pitch = 45) %>%
add_arc(
data = df_santiago,
layer_id = "arc_layer",
origin = c("start_lon", "start_lat"),
destination = c("end_lon", "end_lat"),
# stroke_from = "start_polygon_name",
# stroke_to = "end_polygon_name",
# stroke_width = "stroke",
palette = "reds",
legend = list( stroke_from = F, stroke_to = F ),
)
# plot the interactive map
flowmap
```
::: {.callout-note icon="false" appearance="simple"}
## Question 6
Create a flow map for your province of interest.
:::