-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path5_spatial_statistics_analysis.Rmd
743 lines (511 loc) · 26.6 KB
/
5_spatial_statistics_analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
---
title: "Geospatial Analysis and Representation for Data Science Project"
output:
html_document:
df_print: paged
---
### Introduction
This notebook aims to perform a spatial statistics analysis concerning ski pass prices in the area (the Dolomiti area in the province of Bolzano), investigating whether there are any relationships between them.
Since we are unable to find a complete resource containing the costs of the ski pass for the Dolomiti area, the data used in this notebook was created in the file `ski_area_cost.ipynb` by consulting the price lists on the websites of the various resorts.
The aim is therefore to perform a spatial statistics analysis to understand whether there is a relationship between the location of resort and their prices. In particular, we consider the representative points of each ski resort as coordinates of the ski area.
### Set up
```{r message=FALSE, warning=FALSE}
if(!require(rgdal)) {
install.packages("rgdal")
library(rgdal)
}
if(!require(spdep)) {
install.packages("spdep")
library(spdep)
}
if(!require(boot)) {
install.packages("boot")
library(boot)
}
```
### Import data
#### Import South Tirol's Dolomiti area
The data is build and saved in the ski_area_cost.ipynb file
```{r}
area_dolomiti_BZ = readOGR(dsn=path.expand("data/area_dolomiti_BZ"), layer="area_dolomiti_BZ")
head(area_dolomiti_BZ@data)
```
area_dolomiti_BZ is the shape representing the Dolomiti municipalities in province of Bolzano, which is the location of the ski areas we will investigating in this section.
We can plot this Dolomiti area:
```{r}
plot(area_dolomiti_BZ)
```
#### Import ski areas data
These data represent the coordinates and some information about the representative points of each ski resort and the prices of their ski passes. We computed the representative points in the dataset preprocessing in Python. The various ski areas originally had a Polygon shape, therefore it would have been necessary to calculate a reference point (centroid or representative point) to perform the spatial statistics analyses. We preferred the calculation of the representative point since, due to their shape, the centroids were prone to fall outside the resort area in some cases.
Obviously, these resorts are composed of several slopes and lifts, in fact the price of the ski pass includes access to the entire area to which it refers.
```{r}
BZ_dolomiti_ski_areas = readOGR(dsn=path.expand("data/ski_slopes_BZ_dissolved"), layer="ski_slopes_BZ_dissolved")
BZ_dolomiti_ski_areas@data
```
We can plot the ski areas representative points:
```{r fig.height=8, fig.width=14}
par(mar=c(0,0,0,0))
plot(area_dolomiti_BZ)
points(BZ_dolomiti_ski_areas, pch=20, col="blue", cex=2)
text(
coordinates(BZ_dolomiti_ski_areas),
labels = BZ_dolomiti_ski_areas$NAME_IT,
cex = 0.9,
col = "blue",
pos = 3
)
```
The columns of the dataset are:
```{r}
names(BZ_dolomiti_ski_areas@data)
```
* OBJECTID refers to the ski area's id;
* NAME_IT refers to the italian name of the ski area;
* NAME_DE refers to the german name of the ski area;
* AREA refers to the area of each complete ski resort original shape;
* prices refers to the price of the ski pass required to access the ski lifts of the area.
Let's look at the columns types:
```{r}
str(BZ_dolomiti_ski_areas@data)
```
#### What is the price range for ski passes?
```{r}
unique(BZ_dolomiti_ski_areas$prices)
summary(BZ_dolomiti_ski_areas$prices)
```
We have 7 unique prices, this means that some ski areas have the same ski pass prices.
#### How many observations we have for each price?
```{r}
barplot(table(BZ_dolomiti_ski_areas$prices))
```
#### Where are the most expensive ski areas?
```{r fig.height=8, fig.width=14}
par(mar=c(0,0,0,0))
subset <- BZ_dolomiti_ski_areas[BZ_dolomiti_ski_areas$prices== max(BZ_dolomiti_ski_areas$prices),]
head(subset@data)
plot(area_dolomiti_BZ)
points(subset, pch=20, col="red", cex=2)
text(
coordinates(subset),
labels = subset$NAME_IT,
cex = 0.9,
col = "red",
pos = 3
)
```
#### Which ski area is the less expensive and where is it?
```{r fig.height=8, fig.width=14}
par(mar=c(0,0,0,0))
subset <- BZ_dolomiti_ski_areas[BZ_dolomiti_ski_areas$prices== min(BZ_dolomiti_ski_areas$prices),]
head(subset@data)
plot(area_dolomiti_BZ)
points(subset, pch=20, col="green", cex=2)
text(
coordinates(subset),
labels = subset$NAME_IT,
cex = 0.9,
col = "green",
pos = 3
)
```
#### Which are the most extensive resorts (in terms of area)?
```{r fig.height=8, fig.width=14}
par(mar=c(0,0,0,0))
sorted <- BZ_dolomiti_ski_areas[order(-BZ_dolomiti_ski_areas$AREA),]
head(sorted@data)
plot(area_dolomiti_BZ)
points(sorted, pch=20, col="red", cex=sorted$AREA/mean(sorted$AREA))
text(
coordinates(sorted),
labels = sorted$NAME_IT,
cex = 0.7,
col = "blue",
pos = 3
)
```
There are clearly much larger ski areas than others.
---
## Defining neighbours
By referring to the distances between the reference points (for us the representative points) we can then define the neighbourhood relations between the spatial units.
Several definitions of proximity are possible: in our case k-nearest neighbours and critical neighbourhood cut-off will be applied; we will not deal with contiguity-based neighbourhood as our units do not share borders.
### K-Nearest neighbours
The k-nearest neighbours criterion assumes that two spatial units are considered neighbours if their distance is equal, or less than equal, to the smallest possible distance that can be found between all observations.
This definition of neighbourhood ensures that each spatial unit has precisely the same number k of neighbours.
Let's start with 1 neighbour (k = 1):
```{r fig.height=8, fig.width=14, message=FALSE, warning=FALSE}
par(mar=c(0,0,0,0))
knn1BZ = knn2nb(knearneigh(BZ_dolomiti_ski_areas, k = 1, longlat = T)) #longlat = T takes into account the fact that the Earth is not flat
plot(area_dolomiti_BZ, border = "grey")
plot(knn1BZ, BZ_dolomiti_ski_areas, add = T)
```
Let's now try with 2 neighbour (k = 2):
```{r fig.height=8, fig.width=14, message=FALSE, warning=FALSE}
knn2BZ = knn2nb(knearneigh(BZ_dolomiti_ski_areas, k = 2, longlat = T))
plot(area_dolomiti_BZ, border = "grey")
plot(knn2BZ, BZ_dolomiti_ski_areas, add = T)
```
With k = 4:
```{r fig.height=8, fig.width=14, message=FALSE, warning=FALSE}
knn4BZ = knn2nb(knearneigh(BZ_dolomiti_ski_areas, k = 4, longlat = T))
plot(area_dolomiti_BZ, border = "grey")
plot(knn4BZ, BZ_dolomiti_ski_areas, add = T)
```
With k = 6:
```{r fig.height=8, fig.width=14, message=FALSE, warning=FALSE}
knn6BZ = knn2nb(knearneigh(BZ_dolomiti_ski_areas, k = 6, longlat = T))
plot(area_dolomiti_BZ, border = "grey")
plot(knn4BZ, BZ_dolomiti_ski_areas, add = T)
```
With k = 8:
```{r fig.height=8, fig.width=14, message=FALSE, warning=FALSE}
knn8BZ = knn2nb(knearneigh(BZ_dolomiti_ski_areas, k = 8, longlat = T))
plot(area_dolomiti_BZ, border = "grey")
plot(knn8BZ, BZ_dolomiti_ski_areas, add = T)
```
### Critical cut-off neighbourhood
The neighbourhood criterion of the critical cut-off implies that two spatial units are considered neighbours if their distance is equal, or less than equal, to a certain fixed distance representing a critical cut-off. This cut-off distance should not be less than a minimum value that ensures that each spatial unit has at least one neighbour. For this reason, we must first calculate the minimum threshold distance that allows all regions to have at least one neighbour. As we have seen in class, we can do this by using the list of k nearest neighbours with k = 1 and calculating the maximum distance that separates all pairs of spatial units.
```{r message=FALSE, warning=FALSE}
all.linkedT = max(unlist(nbdists(knn1BZ, BZ_dolomiti_ski_areas, longlat = T)))
all.linkedT
```
We found that the minimum threshold distance is 0.07901, which is actually very small, but it is OK since we are working with a fairly small area.
After this calculation we now know that the cut-off distance has to be greater than 0.07901.
We can try different values of the cut-off distance for different neighbourhood definitions:
```{r message=FALSE, warning=FALSE}
dnb079 = dnearneigh(BZ_dolomiti_ski_areas, 0, 0.0791, longlat=T)
dnb079
```
```{r fig.height=8, fig.width=14, message=FALSE, warning=FALSE}
plot(area_dolomiti_BZ, border = "grey", xlab = "", ylab = "", xlim = NULL)
title(main = "d nearest neighbours, d = 0.079")
plot(dnb079, BZ_dolomiti_ski_areas, add = TRUE, col = "blue")
```
```{r fig.height=8, fig.width=14, message=FALSE, warning=FALSE}
dnb089 = dnearneigh(BZ_dolomiti_ski_areas, 0, 0.0891, longlat=T)
plot(area_dolomiti_BZ, border = "grey", xlab = "", ylab = "", xlim = NULL)
title(main = "d nearest neighbours, d = 0.089")
plot(dnb089, BZ_dolomiti_ski_areas, add = TRUE, col = "green")
dnb089
```
```{r fig.height=8, fig.width=14, message=FALSE, warning=FALSE}
dnb099 = dnearneigh(BZ_dolomiti_ski_areas, 0, 0.0991, longlat=T)
plot(area_dolomiti_BZ, border = "grey", xlab = "", ylab = "", xlim = NULL)
title(main = "d nearest neighbours, d = 0.099")
plot(dnb099, BZ_dolomiti_ski_areas, add = TRUE, col = "green")
dnb099
```
We have the same results with d = 0.089 and d = 0.99, thus we can try to increase the values further:
```{r fig.height=8, fig.width=14, message=FALSE, warning=FALSE}
dnb109 = dnearneigh(BZ_dolomiti_ski_areas, 0, 0.1091, longlat=T)
plot(area_dolomiti_BZ, border = "grey", xlab = "", ylab = "", xlim = NULL)
title(main = "d nearest neighbours, d = 0.1091")
plot(dnb109, BZ_dolomiti_ski_areas, add = TRUE, col = "yellow")
dnb109
```
```{r fig.height=8, fig.width=14, message=FALSE, warning=FALSE}
dnb159 = dnearneigh(BZ_dolomiti_ski_areas, 0, 0.1591, longlat=T)
plot(area_dolomiti_BZ, border = "grey", xlab = "", ylab = "", xlim = NULL)
title(main = "d nearest neighbours, d = 0.1591")
plot(dnb159, BZ_dolomiti_ski_areas, add = TRUE, col = "orange")
dnb159
```
```{r fig.height=8, fig.width=14, message=FALSE, warning=FALSE}
dnb199 = dnearneigh(BZ_dolomiti_ski_areas, 0, 0.1991, longlat=T)
plot(area_dolomiti_BZ, border = "grey", xlab = "", ylab = "", xlim = NULL)
title(main = "d nearest neighbours, d = 0.1991")
plot(dnb199, BZ_dolomiti_ski_areas, add = TRUE, col = "black")
dnb199
```
Obviously with the critical cut-off the links, and therefore the number of neighbours, can only teds to grow rapidly with increasing distance.
## Defining spatial weights
Once the neighbourhood relationships between observations have been defined, the spatial weights matrix can be created. In our case we will calculate a standardised spatial weight matrix for each list of critical and k-nearest neighbours created earlier.
```{r}
# knn
knn1BZ.listw = nb2listw(knn1BZ, style = "W")
knn2BZ.listw = nb2listw(knn2BZ, style = "W")
knn4BZ.listw = nb2listw(knn4BZ, style = "W")
knn6BZ.listw = nb2listw(knn6BZ, style = "W")
knn8BZ.listw = nb2listw(knn8BZ, style = "W")
# critical cut-off
dnb079.listw = nb2listw(dnb079,style="W")
dnb089.listw = nb2listw(dnb089,style="W")
dnb099.listw = nb2listw(dnb099,style="W")
dnb109.listw = nb2listw(dnb109,style="W")
dnb159.listw = nb2listw(dnb159,style="W")
dnb199.listw = nb2listw(dnb199,style="W")
# Example of output
str(dnb079.listw)
```
## The Moran’s I test of spatial autocorrelation
We now perform Moran's I test of spatial autocorrelation with reference to the variable corresponding to the price of ski passes (BZ_dolomiti_ski_areas$prices). The inspection of these variable may in fact suggest the presence of some form of spatial dependence.
Let's first plot the quantile distribution of the prices variable:
```{r fig.height=8, fig.width=14}
brks = round(quantile(BZ_dolomiti_ski_areas$prices), digits = 3)
colours = grey((length(brks):2)/length(brks))
plot(area_dolomiti_BZ, col = "#E29B50")
points(BZ_dolomiti_ski_areas, col = "black", pch = 19, cex=3.3) # add a layer to obtain a contour effect
points(BZ_dolomiti_ski_areas, col = colours[findInterval(BZ_dolomiti_ski_areas$prices, brks, all.inside = TRUE)], pch = 19, cex=3)
title(main = "Ski resrorts' skipass prices")
```
Finally we perform the global Moran's I test to the prices variable with different specifications of the spatial weights matrix:
```{r}
moran.test(BZ_dolomiti_ski_areas$prices, knn1BZ.listw, randomisation = FALSE)
moran.test(BZ_dolomiti_ski_areas$prices, knn2BZ.listw, randomisation = FALSE)
moran.test(BZ_dolomiti_ski_areas$prices, knn4BZ.listw, randomisation = FALSE)
moran.test(BZ_dolomiti_ski_areas$prices, knn6BZ.listw, randomisation = FALSE)
moran.test(BZ_dolomiti_ski_areas$prices, knn8BZ.listw, randomisation = FALSE)
moran.test(BZ_dolomiti_ski_areas$prices, dnb079.listw, randomisation = FALSE)
moran.test(BZ_dolomiti_ski_areas$prices, dnb089.listw, randomisation = FALSE)
moran.test(BZ_dolomiti_ski_areas$prices, dnb099.listw, randomisation = FALSE)
moran.test(BZ_dolomiti_ski_areas$prices, dnb109.listw, randomisation = FALSE)
moran.test(BZ_dolomiti_ski_areas$prices, dnb159.listw, randomisation = FALSE)
moran.test(BZ_dolomiti_ski_areas$prices, dnb199.listw, randomisation = FALSE)
```
The test was performed under the assumption of normality since randomisation parameter is set on False. Let's try under the assumption of randomization:
```{r}
moran.test(BZ_dolomiti_ski_areas$prices, knn1BZ.listw, randomisation = TRUE)
moran.test(BZ_dolomiti_ski_areas$prices, knn2BZ.listw, randomisation = TRUE)
moran.test(BZ_dolomiti_ski_areas$prices, knn4BZ.listw, randomisation = TRUE)
moran.test(BZ_dolomiti_ski_areas$prices, knn6BZ.listw, randomisation = TRUE)
moran.test(BZ_dolomiti_ski_areas$prices, knn8BZ.listw, randomisation = TRUE)
moran.test(BZ_dolomiti_ski_areas$prices, dnb079.listw, randomisation = TRUE)
moran.test(BZ_dolomiti_ski_areas$prices, dnb089.listw, randomisation = TRUE)
moran.test(BZ_dolomiti_ski_areas$prices, dnb099.listw, randomisation = TRUE)
moran.test(BZ_dolomiti_ski_areas$prices, dnb109.listw, randomisation = TRUE)
moran.test(BZ_dolomiti_ski_areas$prices, dnb159.listw, randomisation = TRUE)
moran.test(BZ_dolomiti_ski_areas$prices, dnb199.listw, randomisation = TRUE)
```
Lastly, we can apply the test with different specifications of the spatial weight matrix under permutation bootstrap:
```{r}
moran.mc(BZ_dolomiti_ski_areas$prices, knn1BZ.listw, nsim = 999)
moran.mc(BZ_dolomiti_ski_areas$prices, knn2BZ.listw, nsim = 999)
moran.mc(BZ_dolomiti_ski_areas$prices, knn4BZ.listw, nsim = 999)
moran.mc(BZ_dolomiti_ski_areas$prices, knn6BZ.listw, nsim = 999)
moran.mc(BZ_dolomiti_ski_areas$prices, knn8BZ.listw, nsim = 999)
moran.mc(BZ_dolomiti_ski_areas$prices, dnb079.listw, nsim = 999)
moran.mc(BZ_dolomiti_ski_areas$prices, dnb089.listw, nsim = 999)
moran.mc(BZ_dolomiti_ski_areas$prices, dnb099.listw, nsim = 999)
moran.mc(BZ_dolomiti_ski_areas$prices, dnb109.listw, nsim = 999)
moran.mc(BZ_dolomiti_ski_areas$prices, dnb159.listw, nsim = 999)
moran.mc(BZ_dolomiti_ski_areas$prices, dnb199.listw, nsim = 999)
```
The values are only slightly different according to the 3 approaches used (especially between normality and randomization assumptions). When more neighbours are considered (both in the knn and in the critical cut-off) there seems to be a very small positive global spatial autocorrelation. In the most significant case of the knn, with 6 and 8 neighbours and in the case of the highest distances with the critical cut-off, it seems to reach at most a positive correlation of 0.12. This is actually very low and makes us think that there is no effective global correlation.
## The Moran’s I test of spatial autocorrelation in OLS residuals
As shown in class, Moran's I test can also be used as a diagnostic tool to detect the presence of spatial autocorrelation in the residuals of a linear regression model.
Let's try to apply these to our data considering AREA column:
```{r}
LinearSolow = lm(prices ~ AREA, BZ_dolomiti_ski_areas)
summary(LinearSolow)
```
The results of linear regression are slightly significant, thus we can plot the studentized residuals, which can give a hint about the presence of spatial dependence in the residuals:
```{r fig.height=8, fig.width=14}
studres = rstudent(LinearSolow)
resdistr = quantile(studres)
colours = grey((length(resdistr):2)/length(resdistr))
plot(area_dolomiti_BZ, col = "#E29B50")
points(BZ_dolomiti_ski_areas, col = colours[findInterval(studres, resdistr, all.inside = T)], cex = 2.5, pch = 18)
```
We can now apply the test to the studentized residuals of the `LinearSolow` model, for different specifications of the spatial weights matrix:
```{r}
lm.morantest(LinearSolow, knn1BZ.listw, resfun = rstudent)
lm.morantest(LinearSolow, knn2BZ.listw, resfun = rstudent)
lm.morantest(LinearSolow, knn4BZ.listw, resfun = rstudent)
lm.morantest(LinearSolow, knn6BZ.listw, resfun = rstudent)
lm.morantest(LinearSolow, knn8BZ.listw, resfun = rstudent)
lm.morantest(LinearSolow, dnb079.listw, resfun = rstudent)
lm.morantest(LinearSolow, dnb089.listw, resfun = rstudent)
lm.morantest(LinearSolow, dnb099.listw, resfun = rstudent)
lm.morantest(LinearSolow, dnb109.listw, resfun = rstudent)
lm.morantest(LinearSolow, dnb159.listw, resfun = rstudent)
lm.morantest(LinearSolow, dnb199.listw, resfun = rstudent)
```
Again, as in the previous approaches, we obtain the lowest p-values in the specifications that consider more neighbours. However, the p-values are quite high in this case.
Let's apply the test under permutation bootstrap as final step:
```{r}
LinearSolow.lmx = lm(
prices ~ AREA,
data = BZ_dolomiti_ski_areas,
x = TRUE
)
MoraneI.boot = function(var, i, ...) {
var = var[i]
lmres = lm(var ~ LinearSolow.lmx$x - 1)
return(moran(x = residuals(lmres), ...)$I)
}
boot1 = boot(
residuals(LinearSolow.lmx),
statistic = MoraneI.boot,
R = 999,
sim = "permutation",
listw = knn8BZ.listw,
n = length(knn8BZ.listw$neighbours),
S0 = Szero(knn8BZ.listw)
)
ti = (boot1$t0 - mean(boot1$t))/sqrt(var(boot1$t))
plot(boot1)
```
---
## Descriptive spatial statistics for areal data (local spatial spillover)
Moran's I-statistic is a global measure and therefore does not allow to identify local patterns of spatial autocorrelation.
As we saw in class in many circumstances, however, it may also be of interest to assess the presence of local spatial clusters and to check the specific contribution of any particular region to the global pattern of spatial dependence.
We will investigate the local spatial autocorrelation by the means of the Moran scatterplot and the local Moran's I.
### Moran scatterplot
In the Moran scatterplot we can divide the plane into four quadrants that connote the four possible kinds of spatial association between each region and the other neighbouring regions:
* The regions which fall into the High-High and Low-Low quadrants are characterized by positive spatial autocorrelation and are surrounded by regions with similar values;
* *High-Low* and *Low-High* identify local patterns of negative spatial autocorrelation since they collect the regions with a high (respectively low) value of x and, in opposition, a low (respectively high) value of wx.
NB: x axis represents the variable of interest and y axis represents the corresponding spatially lagged values (wx).
Let's plot the Moran scatterplot specifying our variable of interest (prices):
```{r}
mplot_knn1BZ = moran.plot(
BZ_dolomiti_ski_areas$prices,
listw=knn1BZ.listw,
main="Moran scatterplot - knn1BZ",
return_df=F
)
mplot_knn2BZ = moran.plot(
BZ_dolomiti_ski_areas$prices,
listw=knn2BZ.listw,
main="Moran scatterplot - knn2BZ",
return_df=F
)
mplot_knn4BZ = moran.plot(
BZ_dolomiti_ski_areas$prices,
listw=knn4BZ.listw,
main="Moran scatterplot - knn4BZ",
return_df=F
)
mplot_knn6BZ = moran.plot(
BZ_dolomiti_ski_areas$prices,
listw=knn6BZ.listw,
main="Moran scatterplot - knn6BZ",
return_df=F
)
mplot_knn8BZ = moran.plot(
BZ_dolomiti_ski_areas$prices,
listw=knn8BZ.listw,
main="Moran scatterplot - knn8BZ",
return_df=F
)
mplot_dnb079 = moran.plot(
BZ_dolomiti_ski_areas$prices,
listw=dnb079.listw,
main="Moran scatterplot - dnb079",
return_df=F
)
mplot_dnb089 = moran.plot(
BZ_dolomiti_ski_areas$prices,
listw=dnb089.listw,
main="Moran scatterplot - dnb089",
return_df=F
)
mplot_dnb099 = moran.plot(
BZ_dolomiti_ski_areas$prices,
listw=dnb099.listw,
main="Moran scatterplot - dnb099",
return_df=F
)
mplot_dnb109 = moran.plot(
BZ_dolomiti_ski_areas$prices,
listw=dnb109.listw,
main="Moran scatterplot - dnb109",
return_df=F
)
mplot_dnb159 = moran.plot(
BZ_dolomiti_ski_areas$prices,
listw=dnb159.listw,
main="Moran scatterplot - dnb159",
return_df=F
)
mplot_dnb199 = moran.plot(
BZ_dolomiti_ski_areas$prices,
listw=dnb199.listw,
main="Moran scatterplot - dnb199",
return_df=F
)
```
The points (ski areas) that are relatively more influential than others in determining the observed value of global Moran's I are represented by marked points. These points are identified as having the greatest influence on the regression line based on standard criteria such as Cook's distance and linkages.
For this reason it may be useful to map regions of significant influence encoded by their quadrant in Moran's scatterplot. First, we need to identify the influential regions (which was already visible form the plots):
```{r}
hotspot = as.numeric(row.names(as.data.frame(summary(mplot_dnb159))))
hotspot = as.numeric(row.names(as.data.frame(summary(mplot_knn6BZ))))
```
We can also obtain the spatially lagged values (wx) of the variable of interest for the influential regions:
```{r}
BZ_dolomiti_ski_areas$wx = lag.listw(knn6BZ.listw, BZ_dolomiti_ski_areas$prices)
```
Moreover we can assign each influential region to the proper Moran scatterplot quadrant:
```{r}
BZ_dolomiti_ski_areas$quadrant = rep("None", length(BZ_dolomiti_ski_areas$prices))
for(i in 1:length(hotspot)) {
if (BZ_dolomiti_ski_areas$prices[hotspot[i]] > mean(BZ_dolomiti_ski_areas$prices) & BZ_dolomiti_ski_areas$wx[hotspot[i]] > mean(BZ_dolomiti_ski_areas$wx))
BZ_dolomiti_ski_areas$quadrant[hotspot[i]] = "HH"
if (BZ_dolomiti_ski_areas$prices[hotspot[i]] > mean(BZ_dolomiti_ski_areas$prices) & BZ_dolomiti_ski_areas$wx[hotspot[i]] < mean(BZ_dolomiti_ski_areas$wx))
BZ_dolomiti_ski_areas$quadrant[hotspot[i]] = "HL"
if (BZ_dolomiti_ski_areas$prices[hotspot[i]] < mean(BZ_dolomiti_ski_areas$prices) & BZ_dolomiti_ski_areas$wx[hotspot[i]] < mean(BZ_dolomiti_ski_areas$wx))
BZ_dolomiti_ski_areas$quadrant[hotspot[i]] = "LL"
if (BZ_dolomiti_ski_areas$prices[hotspot[i]] < mean(BZ_dolomiti_ski_areas$prices) & BZ_dolomiti_ski_areas$wx[hotspot[i]] > mean(BZ_dolomiti_ski_areas$wx))
BZ_dolomiti_ski_areas$quadrant[hotspot[i]] = "LH"
}
table(BZ_dolomiti_ski_areas$quadrant)
```
And we can plot the results, where is our seignificant point on the map:
```{r fig.height=8, fig.width=14}
# which allows us to plot the map of the regions with influence by typing
BZ_dolomiti_ski_areas$colours[BZ_dolomiti_ski_areas$quadrant == "None"] = "#EBEBEB"
BZ_dolomiti_ski_areas$colours[BZ_dolomiti_ski_areas$quadrant == "HH"] = "#745296"
BZ_dolomiti_ski_areas$colours[BZ_dolomiti_ski_areas$quadrant == "LL"] = "#8B9EB7" # 1 low low
BZ_dolomiti_ski_areas$colours[BZ_dolomiti_ski_areas$quadrant == "LH"] = "#E85F5C"
BZ_dolomiti_ski_areas$colours[BZ_dolomiti_ski_areas$quadrant == "HL"] = "#FFF07C"
plot(area_dolomiti_BZ, col = "#E29B50")
points(BZ_dolomiti_ski_areas, col = "black", pch=15, cex=2.25)
points(BZ_dolomiti_ski_areas, col = BZ_dolomiti_ski_areas$colours, pch=15, cex=2)
legend(
x = -10,
y = 73,
legend = c("None", "Low-Low", "High-Low", "Low-High", "High-High"),
fill = c("#EBEBEB", "#8B9EB7", "#FFF07C", "#E85F5C", "#745296"),
bty = "n",
cex = 1
)
title(main = "Points with influence")
```
## The Local Moran's I
The Moran scatterplot represents a useful and intuitive visual representation of local patterns of spatial association, but it cannot provide statistical significance of the results. For this reason, to assess the significance of the revealed pattern, we can rely on the so-called local Moran I index:
```{r}
lmI = localmoran(BZ_dolomiti_ski_areas$prices, dnb159.listw)
head(lmI)
lmI = localmoran(BZ_dolomiti_ski_areas$prices, knn6BZ.listw)
head(lmI)
```
We can also plot the distribution of local Moran's I-index values:
```{r fig.height=8, fig.width=14}
brks = sort(as.numeric(lmI[,1]))
colours = grey((0:length(lmI[,1]))/length(lmI[,1]))
plot(area_dolomiti_BZ, col = "#E29B50")
points(BZ_dolomiti_ski_areas, col = "black", cex = 2.75, pch = 17)
points(BZ_dolomiti_ski_areas, col = colours[findInterval(lmI[,1], brks, all.inside = TRUE)], cex = 2.5, pch = 17)
title(main = "Local Moran's I values")
```
With this representation we can identify which ski areas have a positive **local** spatial correlation, in white (or light grey), and which have a negative one, in dark.
However, it should be borne in mind that this index does not provide us with information on the significance of results.
As previously done with the global index, Moran's local I-statistic can also be tested for deviations using the null hypothesis of no local spatial autocorrelation and thus can provide the statistical significance of the local spatial patterns detected by Moran's scatterplot. Thus, we can map the corresponding p-values:
```{r fig.height=8, fig.width=14}
pval = as.numeric(lmI[,5])
BZ_dolomiti_ski_areas$colpval[pval>0.05] = "white"
BZ_dolomiti_ski_areas$colpval[pval<=0.05 & pval>0.01] = "#A5D5F3"
BZ_dolomiti_ski_areas$colpval[pval<=0.01 & pval>0.001] = "#6FBCEB"
BZ_dolomiti_ski_areas$colpval[pval<=0.001 & pval>0.0001] = "#3AA3E4"
BZ_dolomiti_ski_areas$colpval[pval<=0.0001] = "#1878B4"
plot(area_dolomiti_BZ, col = "#E29B50")
points(BZ_dolomiti_ski_areas, col="black", pch=18, cex=3.5)
points(BZ_dolomiti_ski_areas, col=BZ_dolomiti_ski_areas$colpval, pch=18, cex=3)
legend(x=-10, y=73, legend=c("Not significant",
"p-value = 0.05", "p-value = 0.01", "p-value = 0.001",
"p-value = 0.0001"), fill=c("white", gray(0.9), gray(0.7),
gray(0.4), "black"), bty="n", cex=0.8)
title(main="Local Moran's I significance map")
```
Only a few of the ski areas have a significant spatial correlation. These are at the bottom left of the map and are: Carezza, Nova Ponente, Panorama, Pietralba e Passo Olcini (which also correspond to point 17, the one with the most influence)
## Conclusion
We did not obtain significant results with most of the chosen neighbouroods, if we consider the global spatial autocorrelation. The p-value was significant only when we considered many neighbours for each ski area; however, the Moran I statistic value was just above 0, indicating an extremely slight positive correlation.
On the other hand, considering the local index we find (again, using quite high number of neighbours or a quite high distance) that there are some ski areas that have significant positive or negative spatial autocorrelation.