forked from geocompx/geocompr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
07-read-write-plot.Rmd
685 lines (561 loc) · 39.5 KB
/
07-read-write-plot.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
# Geographic data I/O {#read-write}
## Prerequisites {-}
This chapter requires the following packages:
```{r 07-read-write-plot-1, message=FALSE}
library(sf)
library(terra)
library(dplyr)
library(spData)
```
## Introduction
<!--toDo:RL-->
<!--revise and update the following section-->
This chapter is about reading and writing geographic data.
Geographic data *import* is essential for geocomputation\index{geocomputation}: real-world applications are impossible without data.
For others to benefit from the results of your work, data *output* is also vital.
Taken together, we refer to these processes as I/O, short for input/output.
Geographic data I/O is almost always part of a wider process.
It depends on knowing which datasets are *available*, where they can be *found* and how to *retrieve* them.
These topics are covered in Section \@ref(retrieving-data), which describes various *geoportals*, which collectively contain many terabytes of data, and how to use them.
To further ease data access, a number of packages for downloading geographic data have been developed.
These are described in Section \@ref(geographic-data-packages).
There are many geographic file formats, each of which has pros and cons.
These are described in Section \@ref(file-formats).
The process of actually reading and writing such file formats efficiently is not covered until Sections \@ref(data-input) and \@ref(data-output), respectively.
The final Section \@ref(visual-outputs) demonstrates methods for saving visual outputs (maps), in preparation for Chapter \@ref(adv-map) on visualization.
## Retrieving open data {#retrieving-data}
<!--toDo:RL-->
<!--revise and update the following section-->
\index{open data}
A vast and ever-increasing amount of geographic data is available on the internet, much of which is free to access and use (with appropriate credit given to its providers).
In some ways there is now *too much* data, in the sense that there are often multiple places to access the same dataset.
Some datasets are of poor quality.
In this context, it is vital to know where to look, so the first section covers some of the most important sources.
Various 'geoportals' (web services providing geospatial datasets such as [Data.gov](https://catalog.data.gov/dataset?metadata_type=geospatial)) are a good place to start, providing a wide range of data but often only for specific locations (as illustrated in the updated [Wikipedia page](https://en.wikipedia.org/wiki/Geoportal) on the topic).
\index{geoportals}
Some global geoportals overcome this issue.
The [GEOSS portal](http://www.geoportal.org/) and the [Copernicus Open Access Hub](https://scihub.copernicus.eu/), for example, contain many raster datasets with global coverage.
A wealth of vector datasets can be accessed from the National Aeronautics and Space Administration agency (NASA), [SEDAC](http://sedac.ciesin.columbia.edu/) portal and the European Union's [INSPIRE geoportal](http://inspire-geoportal.ec.europa.eu/), with global and regional coverage.
Most geoportals provide a graphical interface allowing datasets to be queried based on characteristics such as spatial and temporal extent, the United States Geological Services' [EarthExplorer](https://earthexplorer.usgs.gov/) being a prime example.
*Exploring* datasets interactively on a browser is an effective way of understanding available layers.
*Downloading* data is best done with code, however, from reproducibility and efficiency perspectives.
Downloads can be initiated from the command line using a variety of techniques, primarily via URLs and APIs\index{API} (see the [Sentinel API](https://scihub.copernicus.eu/twiki/do/view/SciHubWebPortal/APIHubDescription) for example).
Files hosted on static URLs can be downloaded with `download.file()`, as illustrated in the code chunk below which accesses US National Parks data from: [catalog.data.gov/dataset/national-parks](https://catalog.data.gov/dataset/national-parks):
```{r 07-read-write-plot-2, eval=FALSE}
download.file(url = "http://nrdata.nps.gov/programs/lands/nps_boundary.zip",
destfile = "nps_boundary.zip")
unzip(zipfile = "nps_boundary.zip")
usa_parks = st_read(dsn = "nps_boundary.shp")
```
## Geographic data packages
<!--toDo:RL-->
<!--toDo:JN-->
<!--revise and update the following section-->
\index{data packages}
Many R packages have been developed for accessing geographic data, some of which are presented in Table \@ref(tab:datapackages).
These provide interfaces to one or more spatial libraries or geoportals and aim to make data access even quicker from the command line.
```{r datapackages, echo=FALSE, warning=FALSE}
datapackages = tibble::tribble(
~`Package`, ~Description,
"getlandsat", "Provides access to Landsat 8 data.",
"osmdata", "Download and import small OpenStreetMap datasets.",
"osmextract", "Download and import large OpenStreetMap datasets.",
"raster", "getData() imports administrative, elevation, WorldClim data.",
"rnaturalearth", "Access to Natural Earth vector and raster data.",
"rnoaa", "Imports National Oceanic and Atmospheric Administration (NOAA) climate data.",
"rWBclimate", "Access World Bank climate data."
)
knitr::kable(datapackages,
caption = "Selected R packages for geographic data retrieval.",
caption.short = "Selected R packages for geographic data retrieval.",
booktabs = TRUE) %>%
kableExtra::kable_styling(latex_options="scale_down")
```
It should be emphasised that Table \@ref(tab:datapackages) represents only a small number of available geographic data packages.
Other notable packages include **GSODR**, which provides Global Summary Daily Weather Data in R (see the package's [README](https://github.com/ropensci/GSODR) for an overview of weather data sources);
**tidycensus** and **tigris**, which provide socio-demographic vector data for the USA; and **hddtools**, which provides access to a range of hydrological datasets.
Each data package has its own syntax for accessing data.
This diversity is demonstrated in the subsequent code chunks, which show how to get data using three packages from Table \@ref(tab:datapackages).
Country borders are often useful and these can be accessed with the `ne_countries()` function from the **rnaturalearth** package as follows:
```{r 07-read-write-plot-3}
library(rnaturalearth)
usa = ne_countries(country = "United States of America") # United States borders
class(usa)
# alternative way of accessing the data, with raster::getData()
# getData("GADM", country = "USA", level = 0)
```
By default **rnaturalearth** returns objects of class `Spatial`.
The result can be converted into an `sf` objects with `st_as_sf()` as follows:
```{r 07-read-write-plot-4}
usa_sf = st_as_sf(usa)
```
A second example downloads a series of rasters containing global monthly precipitation sums with spatial resolution of ten minutes using the **geodata** package.
The result is a multilayer object of class `SpatRaster`.
```{r 07-read-write-plot-5}
library(geodata)
worldclim_prec = worldclim_global("prec", res = 10, path = tempdir())
class(worldclim_prec)
```
A third example uses the **osmdata** package [@R-osmdata] to find parks from the OpenStreetMap (OSM) database\index{OpenStreetMap}.
As illustrated in the code-chunk below, queries begin with the function `opq()` (short for OpenStreetMap query), the first argument of which is bounding box, or text string representing a bounding box (the city of Leeds in this case).
The result is passed to a function for selecting which OSM elements we're interested in (parks in this case), represented by *key-value pairs*. Next, they are passed to the function `osmdata_sf()` which does the work of downloading the data and converting it into a list of `sf` objects (see `vignette('osmdata')` for further details):
```{r 07-read-write-plot-6, eval=FALSE}
library(osmdata)
parks = opq(bbox = "leeds uk") %>%
add_osm_feature(key = "leisure", value = "park") %>%
osmdata_sf()
```
A limitation with the **osmdata** package is that it is *rate limited*, meaning that it cannot download large OSM datasets (e.g. all the OSM data for a large city).
To overcome this limitation, the **osmextract** package was developed, which can be used to download and import binary `.pbf` files containing compressed versions of the OSM database for pre-defined regions.
<!--todo: add proper citation-->
OpenStreetMap is a vast global database of crowd-sourced data, is growing daily, and has a wider ecosystem of tools enabling easy access to the data, from the [Overpass turbo](https://overpass-turbo.eu/) web service for rapid development and testing of OSM queries to [osm2pgsql](https://osm2pgsql.org/) for importing the data into a PostGIS database.
Although the quality of datasets derived from OSM varies, the data source and wider OSM ecosystems have many advantages: they provide datasets that are available globally, free of charge, and constantly improving thanks to an army of volunteers.
Using OSM encourages 'citizen science' and contributions back to the digital commons (you can start editing data representing a part of the world you know well at [www.openstreetmap.org](https://www.openstreetmap.org)).
Further examples of OSM data in action are provided in Chapters \@ref(gis), \@ref(transport) and \@ref(location).
Sometimes, packages come with inbuilt datasets.
These can be accessed in four ways: by attaching the package (if the package uses 'lazy loading' as **spData** does), with `data(dataset)`, by referring to the dataset with `pkg::dataset` or with `system.file()` to access raw data files.
The following code chunk illustrates the latter two options using the `world` dataset (already loaded by attaching its parent package with `library(spData)`):^[
For more information on data import with R packages, see Sections 5.5 and 5.6 of @gillespie_efficient_2016.
]
```{r 07-read-write-plot-7, eval=FALSE}
world2 = spData::world
world3 = st_read(system.file("shapes/world.gpkg", package = "spData"))
```
## Geographic web services
<!--toDo:RL-->
<!--revise and update the following section-->
\index{geographic web services}
In an effort to standardize web APIs for accessing spatial data, the Open Geospatial Consortium (OGC) has created a number of specifications for web services (collectively known as OWS, which is short for OGC Web Services).
These specifications include the Web Feature Service (WFS)\index{geographic web services!WFS}, Web Map Service (WMS)\index{geographic web services!WMS}, Web Map Tile Service (WMTS)\index{geographic web services!WMTS}, the Web Coverage Service (WCS)\index{geographic web services!WCS} and even a Web Processing Service (WPS)\index{geographic web services!WPS}.
Map servers such as PostGIS have adopted these protocols, leading to standardization of queries.
Like other web APIs, OWS APIs use a 'base URL', an 'endpoint' and 'URL query arguments' following a `?` to request data (see the [`best-practices-api-packages`](https://httr.r-lib.org/articles/api-packages.html) vignette in the **httr** package).
There are many requests that can be made to a OWS service.
One of the most fundamental is `getCapabilities`, demonstrated with **httr** below.
The code chunk demonstrates how API\index{API} queries can be constructed and dispatched, in this case to discover the capabilities of a service run by the Food and Agriculture Organization of the United Nations (FAO):
```{r 07-read-write-plot-8}
base_url = "http://www.fao.org"
endpoint = "/figis/geoserver/wfs"
q = list(request = "GetCapabilities")
res = httr::GET(url = httr::modify_url(base_url, path = endpoint), query = q)
res$url
```
The above code chunk demonstrates how API\index{API} requests can be constructed programmatically with the `GET()` function, which takes a base URL and a list of query parameters which can easily be extended.
The result of the request is saved in `res`, an object of class `response` defined in the **httr** package, which is a list containing information of the request, including the URL.
As can be seen by executing `browseURL(res$url)`, the results can also be read directly in a browser.
One way of extracting the contents of the request is as follows:
```{r 07-read-write-plot-9, eval=FALSE}
txt = httr::content(res, "text")
xml = xml2::read_xml(txt)
```
```{r 07-read-write-plot-10, eval=FALSE}
xml
#> {xml_document} ...
#> [1] <ows:ServiceIdentification>\n <ows:Title>GeoServer WFS...
#> [2] <ows:ServiceProvider>\n <ows:ProviderName>Food and Agr...
#> ...
```
Data can be downloaded from WFS services with the `GetFeature` request and a specific `typeName` (as illustrated in the code chunk below).
```{r 07-read-write-plot-11, echo=FALSE, eval=FALSE}
library(XML)
library(RCurl)
library(httr)
base_url = "http://www.fao.org/figis/geoserver/wfs"
q = list(request = "GetCapabilities")
res = httr::GET(url = base_url, query = q)
doc = xmlParse(res)
root = xmlRoot(doc)
names(root)
names(root[["FeatureTypeList"]])
root[["FeatureTypeList"]][["FeatureType"]][["Name"]]
tmp = xmlSApply(root[["FeatureTypeList"]], function(x) xmlValue(x[["Name"]]))
```
Available names differ depending on the accessed web feature service.
One can extract them programmatically using web technologies [@nolan_xml_2014] or scrolling manually through the contents of the `GetCapabilities` output in a browser.
```{r 07-read-write-plot-12, eval=FALSE}
qf = list(request = "GetFeature", typeName = "area:FAO_AREAS")
file = tempfile(fileext = ".gml")
httr::GET(url = base_url, path = endpoint, query = qf, httr::write_disk(file))
fao_areas = sf::read_sf(file)
```
Note the use of `write_disk()` to ensure that the results are written to disk rather than loaded into memory, allowing them to be imported with **sf**.
This example shows how to gain low-level access to web services using **httr**, which can be useful for understanding how web services work.
For many everyday tasks, however, a higher-level interface may be more appropriate, and a number of R packages, and tutorials, have been developed precisely for this purpose.
Packages **ows4R**, **rwfs** and **sos4R** have been developed for working with OWS services in general, WFS and the sensor observation service (SOS) respectively.
As of October 2018, only **ows4R** is on CRAN.
The package's basic functionality is demonstrated below, in commands that get all `FAO_AREAS` as we did in the previous code chunk:^[
To filter features on the server before downloading them, the argument `cql_filter` can be used. Adding `cql_filter = URLencode("F_CODE= '27'")` to the command, for example, would instruct the server to only return the feature with values in the `F_CODE` column equal to 27.
]
```{r 07-read-write-plot-13, eval=FALSE}
library(ows4R)
wfs = WFSClient$new("http://www.fao.org/figis/geoserver/wfs",
serviceVersion = "1.0.0", logger = "INFO")
fao_areas = wfs$getFeatures("area:FAO_AREAS")
```
```{r 07-read-write-plot-14, echo=FALSE, eval=FALSE}
# not shown as too verbose an example already
area_27 = wfs$getFeatures("area:FAO_AREAS",
cql_filter = URLencode("F_CODE= '27'"))
```
There is much more to learn about web services and much potential for development of R-OWS interfaces, an active area of development.
For further information on the topic, we recommend examples from European Centre for Medium-Range Weather Forecasts (ECMWF) services at [github.com/OpenDataHack](https://github.com/OpenDataHack/data_service_catalogue) and reading-up on OCG Web Services at [opengeospatial.org](http://www.opengeospatial.org/standards).
```{r 07-read-write-plot-15, eval=FALSE, echo=FALSE}
# checking out WFS using German datasets
library(ows4R)
library(sf)
base_url = "http://www.lfu.bayern.de/gdi/wfs/naturschutz/schutzgebiete?"
wfs = WFSClient$new(base_url, "1.0.0", logger = "INFO")
# wfs$getUrl()
# wfs$getClassName()
caps = wfs$getCapabilities()
caps
tmp = caps$findFeatureTypeByName("")
# find out about the available feature types
sapply(tmp, function(x) x$getName())
# ok, let's download the national parcs of Bavaria
ft = caps$findFeatureTypeByName("lfu:nationalpark")
ft$getDescription() # some problem here, I guess due to German spelling (umlaut, etc.). BTW the same happens when using the data from the Netherlands
ft$getBoundingBox() # no bounding box
ft$getDefaultCRS() # default CRS
nps = ft$getFeatures()
# this does not work properly, however, it downloads the data to the temporary
# directory, hence, we can load them into R ourselves
file = grep("gml", dir(tempdir()), value = TRUE)
file = file.path(tempdir(), file)
# assuming there is only one file
layer = read_sf(file)
plot(layer$geometry)
```
```{r 07-read-write-plot-16, eval=FALSE, echo=FALSE}
library(ows4R)
library(sf)
# data gathered from https://catalog.data.gov/dataset?res_format=WFS
# downloading US national parks
base_url = paste0("http://gstore.unm.edu/apps/rgis/datasets/",
"7bbe8af5-029b-4adf-b06c-134f0dd57226/services/ogc/wfs?")
# downloading public US airports
base_url = paste0("http://gstore.unm.edu/apps/rgis/datasets/",
"7387537d-dff6-48d1-9004-4f089f48dea1/services/ogc/wfs?")
# establish the connection
wfs = WFSClient$new(base_url, "1.0.0", logger = "INFO")
# wfs$getUrl()
# wfs$getClassName()
caps = wfs$getCapabilities()
caps
# find out about the available feature types
tmp = caps$findFeatureTypeByName("")
tmp$getName()
# ok, let's download all US national parcs
ft = caps$findFeatureTypeByName("nps_boundary")
# ft = caps$findFeatureTypeByName("tra2shp") # airports
ft$getDescription()
ft$getBoundingBox()
ft$getDefaultCRS()
nps = ft$getFeatures()
# this returns an sf object
plot(nps$msGeometry)
```
## File formats
\index{file formats}
Geographic datasets are usually stored as files or in spatial databases.
File formats can either store vector or raster data, while spatial databases such as [PostGIS](https://trac.osgeo.org/postgis/wiki/WKTRaster) can store both (see also Section \@ref(postgis)).
Today the variety of file formats may seem bewildering but there has been much consolidation and standardization since the beginnings of GIS software in the 1960s when the first widely distributed program ([SYMAP](https://news.harvard.edu/gazette/story/2011/10/the-invention-of-gis/)) for spatial analysis was created at Harvard University [@coppock_history_1991].
\index{GDAL}
GDAL (which should be pronounced "goo-dal", with the double "o" making a reference to object-orientation), the Geospatial Data Abstraction Library, has resolved many issues associated with incompatibility between geographic file formats since its release in 2000.
GDAL provides a unified and high-performance interface for reading and writing of many raster and vector data formats.^[As we mentioned in Chapter \@ref(geometric-operations), GDAL also contains a set of utility functions allowing for raster mosaicing, resampling, cropping, and reprojecting, etc.]
Many open and proprietary GIS programs, including GRASS, ArcGIS\index{ArcGIS} and QGIS\index{QGIS}, use GDAL\index{GDAL} behind their GUIs\index{graphical user interface} for doing the legwork of ingesting and spitting out geographic data in appropriate formats.
GDAL\index{GDAL} provides access to more than 200 vector and raster data formats.
Table \@ref(tab:formats) presents some basic information about selected and often used spatial file formats.
```{r formats, echo=FALSE}
file_formats = tibble::tribble(~Name, ~Extension, ~Info, ~Type, ~Model,
"ESRI Shapefile", ".shp (the main file)", "Popular format consisting of at least three files. No support for: files > 2GB; mixed types; names > 10 chars; cols > 255.", "Vector", "Partially open",
"GeoJSON", ".geojson", "Extends the JSON exchange format by including a subset of the simple feature representation.", "Vector", "Open",
"KML", ".kml", "XML-based format for spatial visualization, developed for use with Google Earth. Zipped KML file forms the KMZ format.", "Vector", "Open",
"GPX", ".gpx", "XML schema created for exchange of GPS data.", "Vector", "Open",
"FlatGeobuf", ".fgb", "Single file format allowing for quick reading and writing of vector data. Has streaming capabilities.", "Vector", "Open",
"GeoTIFF", ".tif/.tiff", "Popular raster format. A TIFF file containing additional spatial metadata.", "Raster", "Open",
"Arc ASCII", ".asc", "Text format where the first six lines represent the raster header, followed by the raster cell values arranged in rows and columns.", "Raster", "Open",
"R-raster", ".gri, .grd", "Native raster format of the R-package raster.", "Raster", "Open",
"SQLite/SpatiaLite", ".sqlite", "Standalone relational database, SpatiaLite is the spatial extension of SQLite.", "Vector and raster", "Open",
"ESRI FileGDB", ".gdb", "Spatial and nonspatial objects created by ArcGIS. Allows: multiple feature classes; topology. Limited support from GDAL.", "Vector and raster", "Proprietary",
"GeoPackage", ".gpkg", "Lightweight database container based on SQLite allowing an easy and platform-independent exchange of geodata", "Vector and raster", "Open"
)
knitr::kable(file_formats,
caption = "Selected spatial file formats.",
caption.short = "Selected spatial file formats.",
booktabs = TRUE) %>%
kableExtra::column_spec(2, width = "7em") %>%
kableExtra::column_spec(3, width = "14em") %>%
kableExtra::column_spec(5, width = "7em")
```
\index{Shapefile}
\index{GeoPackage}
An important development ensuring the standardization and open-sourcing of file formats was the founding of the Open Geospatial Consortium ([OGC](http://www.opengeospatial.org/)) in 1994.
Beyond defining the simple features data model (see Section \@ref(intro-sf)), the OGC also coordinates the development of open standards, for example as used in file formats such as KML\index{KML} and GeoPackage\index{GeoPackage}.
Open file formats of the kind endorsed by the OGC have several advantages over proprietary formats: the standards are published, ensure transparency and open up the possibility for users to further develop and adjust the file formats to their specific needs.
ESRI Shapefile\index{Shapefile} is the most popular vector data exchange format; however, it is not an open format (though its specification is open).
It was developed in the early 1990s and has a number of limitations.
First of all, it is a multi-file format, which consists of at least three files.
It only supports 255 columns, column names are restricted to ten characters and the file size limit is 2 GB.
Furthermore, ESRI Shapefile\index{Shapefile} does not support all possible geometry types, for example, it is unable to distinguish between a polygon and a multipolygon.^[To learn more about ESRI Shapefile limitations and possible alternative file formats, visit http://switchfromshapefile.org/.]
Despite these limitations, a viable alternative had been missing for a long time.
In the meantime, [GeoPackage](https://www.geopackage.org/)\index{GeoPackage} emerged, and seems to be a more than suitable replacement candidate for ESRI Shapefile.
Geopackage is a format for exchanging geospatial information and an OGC standard.
The GeoPackage standard describes the rules on how to store geospatial information in a tiny SQLite container.
Hence, GeoPackage is a lightweight spatial database container, which allows the storage of vector and raster data but also of non-spatial data and extensions.
Aside from GeoPackage, there are other geospatial data exchange formats worth checking out (Table \@ref(tab:formats)).
\index{GeoTIFF}
The GeoTIFF format seems to be the most prominent raster data format.
It allows spatial information, such as CRS, to be embedded within a TIFF file.
Similar to ESRI Shapefile, this format was firstly developed in the 1990s, but as an open format.
Additionally, GeoTIFF is still being expanded and improved.
One of the most significant recent addition to the GeoTIFF format is its variant called COG (*Cloud Optimized GeoTIFF*).
Raster objects saved as COGs can be hosted on HTTP servers, so other people can read only parts of the file without downloading the whole file.
## Data input (I) {#data-input}
Executing commands such as `sf::st_read()` (the main function we use for loading vector data) or `terra::rast()` (the main function used for loading raster data) silently sets off a chain of events that reads data from files.
Moreover, there are many R packages containing a wide range of geographic data or providing simple access to different data sources.
All of them load the data into R or, more precisely, assign objects to your workspace, stored in RAM accessible from the [`.GlobalEnv`](http://adv-r.had.co.nz/Environments.html) of the R session.
### Vector data {#iovec}
<!--toDo:RL-->
<!--st_read vs read_sf-->
\index{vector!data input}
Spatial vector data comes in a wide variety of file formats.
Most popular representations such as `.geojson` and `.gpkg` files can be imported directly into R with the **sf** function `st_read()` (or the 'tidy' equivalent `read_sf()`), which uses [GDAL's vector drivers](https://gdal.org/drivers/vector/index.html)\index{GDAL} behind the scenes.
`st_drivers()` returns a data frame containing `name` and `long_name` in the first two columns, and features of each driver available to GDAL (and therefore **sf**), including ability to write data and store raster data in the subsequent columns, as illustrated for key file formats in Table \@ref(tab:drivers).
The following commands show the first three drivers reported the computer's GDAL installation (results can vary depending on the GDAL version installed) and a summary of the their features.
Note that the majority of drivers can write data (51 out of 87) while only 16 formats can efficiently represent raster data in addition to vector data (see `?st_drivers()` for details):
```{r 07-read-write-plot-17, eval=FALSE}
sf_drivers = st_drivers()
head(sf_drivers, n = 3)
summary(sf_drivers[-c(1:2)])
```
```{r drivers, echo=FALSE}
sf_drivers = st_drivers() %>%
dplyr::filter(name %in% c("ESRI Shapefile", "GeoJSON", "KML", "GPX", "GPKG", "FlatGeobuf")) %>%
tibble::as_tibble() # remove unhelpful row names
knitr::kable(head(sf_drivers, n = 6),
caption = paste("Popular drivers/formats for reading/writing",
"vector data."),
caption.short = "Sample of available vector drivers.",
booktabs = TRUE) %>%
kableExtra::column_spec(2, width = "7em")
```
<!-- One of the major advantages of **sf** is that it is fast. -->
<!-- reference to the vignette -->
The first argument of `st_read()` is `dsn`, which should be a text string or an object containing a single text string.
The content of a text string could vary between different drivers.
In most cases, as with the ESRI Shapefile\index{Shapefile} (`.shp`) or the `GeoPackage`\index{GeoPackage} format (`.gpkg`), the `dsn` would be a file name.
`st_read()` guesses the driver based on the file extension, as illustrated for a `.gpkg` file below:
```{r 07-read-write-plot-18, eval=FALSE}
vector_filepath = system.file("shapes/world.gpkg", package = "spData")
world = st_read(vector_filepath)
#> Reading layer `world' from data source `.../world.gpkg' using driver `GPKG'
#> Simple feature collection with 177 features and 10 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
```
```{r 07-read-write-plot-19, echo=FALSE}
vector_filepath = system.file("shapes/world.gpkg", package = "spData")
world = st_read(vector_filepath, quiet = TRUE)
```
For some drivers, `dsn` could be provided as a folder name, access credentials for a database, or a GeoJSON string representation (see the examples of the `st_read()` help page for more details).
Some vector driver formats can store multiple data layers.
By default, `st_read()` automatically reads the first layer of the file specified in `dsn`; however, using the `layer` argument you can specify any other layer.
Naturally, some options are specific to certain drivers.^[
A list of supported vector formats and options can be found at http://gdal.org/ogr_formats.html.
]
For example, think of coordinates stored in a spreadsheet format (`.csv`).
To read in such files as spatial objects, we naturally have to specify the names of the columns (`X` and `Y` in our example below) representing the coordinates.
We can do this with the help of the `options` parameter.
To find out about possible options, please refer to the 'Open Options' section of the corresponding GDAL\index{GDAL} driver description.
For the comma-separated value (csv) format, visit http://www.gdal.org/drv_csv.html.
```{r 07-read-write-plot-20, results='hide'}
cycle_hire_txt = system.file("misc/cycle_hire_xy.csv", package = "spData")
cycle_hire_xy = st_read(cycle_hire_txt, options = c("X_POSSIBLE_NAMES=X",
"Y_POSSIBLE_NAMES=Y"))
```
Instead of columns describing xy-coordinates, a single column can also contain the geometry information.
Well-known text (WKT)\index{well-known text}, well-known binary (WKB)\index{well-known binary}, and the GeoJSON formats are examples of this.
For instance, the `world_wkt.csv` file has a column named `WKT` representing polygons of the world's countries.
We will again use the `options` parameter to indicate this.
Here, we will use `read_sf()`, the tidyverse-flavoured version of `st_read()`: strings are parsed as characters instead of factors and the resulting data frame is a [tibble](https://r4ds.had.co.nz/tibbles.html). The driver name is also not printed to the console.
```{r 07-read-write-plot-21, results='hide'}
world_txt = system.file("misc/world_wkt.csv", package = "spData")
world_wkt = read_sf(world_txt, options = "GEOM_POSSIBLE_NAMES=WKT")
# the same as
world_wkt2 = st_read(world_txt, options = "GEOM_POSSIBLE_NAMES=WKT",
quiet = TRUE, stringsAsFactors = FALSE, as_tibble = TRUE)
```
```{r, echo=FALSE, eval=FALSE}
identical(world_wkt, world_wkt2)
```
```{block2 07-read-write-plot-22, type='rmdnote'}
Not all of the supported vector file formats store information about their coordinate reference system.
In these situations, it is possible to add the missing information using the `st_set_crs()` function.
Please refer also to Section \@ref(crs-intro) for more information.
```
As a final example, we will show how `st_read()` also reads KML files.
A KML file stores geographic information in XML format - a data format for the creation of web pages and the transfer of data in an application-independent way [@nolan_xml_2014].
Here, we access a KML file from the web.
This file contains more than one layer.
`st_layers()` lists all available layers.
We choose the first layer `Placemarks` and say so with the help of the `layer` parameter in `read_sf()`.
```{r 07-read-write-plot-23}
u = "https://developers.google.com/kml/documentation/KML_Samples.kml"
download.file(u, "KML_Samples.kml")
st_layers("KML_Samples.kml")
kml = read_sf("KML_Samples.kml", layer = "Placemarks")
```
All the examples presented in this section so far have used the **sf** package for geographic data import.
It is fast and flexible but it may be worth looking at other packages for specific file formats.
An example is the **geojsonsf** package.
A [benchmark](https://github.com/ATFutures/geobench) suggests it is around 10 times faster than the **sf** package for reading `.geojson`.
```{r, echo=FALSE, results='hide'}
file.remove("KML_Samples.kml")
```
### Raster data
\index{raster!data input}
Similar to vector data, raster data comes in many file formats with some of them supporting multilayer files.
**terra**'s `rast()` command reads in a single layer when a file with just one layer is provided.
```{r 07-read-write-plot-24, message=FALSE}
raster_filepath = system.file("raster/srtm.tif", package = "spDataLarge")
single_layer = rast(raster_filepath)
```
It also works in case you want to read a multilayer file.
```{r 07-read-write-plot-25}
multilayer_filepath = system.file("raster/landsat.tif", package = "spDataLarge")
multilayer_rast = rast(multilayer_filepath)
```
<!-- COG example?? -->
<!-- ### Databases -->
<!-- postgis input example -->
## Data output (O) {#data-output}
Writing geographic data allows you to convert from one format to another and to save newly created objects.
Depending on the data type (vector or raster), object class (e.g., `sf` or `SpatRaster`), and type and amount of stored information (e.g., object size, range of values), it is important to know how to store spatial files in the most efficient way.
The next two sections will demonstrate how to do this.
### Vector data
<!--toDo:RL-->
<!--st_write vs write_sf-->
\index{vector!data output}
```{r 07-read-write-plot-27, echo=FALSE, results='hide'}
world_files = list.files(pattern = "world\\.")
file.remove(world_files)
```
The counterpart of `st_read()` is `st_write()`.
It allows you to write **sf** objects to a wide range of geographic vector file formats, including the most common such as `.geojson`, `.shp` and `.gpkg`.
Based on the file name, `st_write()` decides automatically which driver to use.
The speed of the writing process depends also on the driver.
```{r 07-read-write-plot-28}
st_write(obj = world, dsn = "world.gpkg")
```
**Note**: if you try to write to the same data source again, the function will fail:
```{r 07-read-write-plot-29, error=TRUE}
st_write(obj = world, dsn = "world.gpkg")
```
The error message tells us why the function failed:
`world.gpkg` already contains a layer called `world`.
To fix this problem, use the `append` argument.
When this argument is set to `TRUE`, the old layer is kept and the `world` object is added as a second new layer:
```{r 07-read-write-plot-31, results='hide'}
st_write(obj = world, dsn = "world.gpkg", append = TRUE)
```
Alternatively with `append = FALSE`, existing layers are deleted before the function attempts to write our object to a file:
```{r 07-read-write-plot-30, results='hide'}
st_write(obj = world, dsn = "world.gpkg", append = FALSE)
```
You can achieve the same with `write_sf()` since it is equivalent to (technically an *alias* for) `st_write()`, except that its defaults for `append` is `FALSE` and `quiet` is `TRUE`.
```{r 07-read-write-plot-32}
write_sf(obj = world, dsn = "world.gpkg")
```
The `layer_options` argument could be also used for many different purposes.
One of them is to write spatial data to a text file.
This can be done by specifying `GEOMETRY` inside of `layer_options`.
It could be either `AS_XY` for simple point datasets (it creates two new columns for coordinates) or `AS_WKT` for more complex spatial data (one new column is created which contains the well-known text representation of spatial objects).
```{r 07-read-write-plot-33, eval=FALSE}
st_write(cycle_hire_xy, "cycle_hire_xy.csv", layer_options = "GEOMETRY=AS_XY")
st_write(world_wkt, "world_wkt.csv", layer_options = "GEOMETRY=AS_WKT")
```
```{r, echo=FALSE, results='hide'}
file.remove("world.gpkg")
```
### Raster data
\index{raster!data output}
The `writeRaster()` function saves `SpatRaster` objects to files on disk.
The function expects input regarding output data type and file format, but also accepts GDAL options specific to a selected file format (see `?writeRaster` for more details).
\index{raster!data types}
The **terra** package offers nine data types when saving a raster: LOG1S, INT1S, INT1U, INT2S, INT2U, INT4S, INT4U, FLT4S, and FLT8S.^[
Using INT4U is not recommended as R does not support 32-bit unsigned integers.
]
The data type determines the bit representation of the raster object written to disk (Table \@ref(tab:datatypes)).
Which data type to use depends on the range of the values of your raster object.
The more values a data type can represent, the larger the file will get on disk.
Commonly, one would use LOG1S for bitmap (binary) rasters.
Unsigned integers (INT1U, INT2U, INT4U) are suitable for categorical data, while float numbers (FLT4S and FLT8S) usually represent continuous data.
`writeRaster()` uses FLT4S as the default.
While this works in most cases, the size of the output file will be unnecessarily large if you save binary or categorical data.
Therefore, we would recommend to use the data type that needs the least storage space, but is still able to represent all values (check the range of values with the `summary()` function).
```{r datatypes, echo=FALSE}
dT = tibble::tribble(
~`Data type`, ~`Minimum value`, ~`Maximum value`,
"LOG1S", "FALSE (0)", "TRUE (1)",
"INT1S", "-127", "127",
"INT1U", "0", "255",
"INT2S", "-32,767", "32,767",
"INT2U", "0", "65,534",
"INT4S", "-2,147,483,647", "2,147,483,647",
"INT4U", "0", "4,294,967,296",
"FLT4S", "-3.4e+38", "3.4e+38",
"FLT8S", "-1.7e+308", "1.7e+308"
)
knitr::kable(dT, caption = "Data types supported by the raster package.",
caption.short = "Data types supported by the raster package.",
booktabs = TRUE)
```
By default, the output file format is derived from the filename.
Naming a file `*.tif` will create a GeoTIFF file, as demonstrated below:
```{r 07-read-write-plot-34, eval=FALSE}
writeRaster(single_layer, filename = "my_raster.tif", datatype = "INT2U")
```
Some raster file formats have additional options, that can be set by providing [GDAL parameters](http://www.gdal.org/formats_list.html) to the `options` argument of `writeRaster()`.
GeoTIFF files are written in **terra**, by default, with the LZW compression `gdal = c("COMPRESS=LZW")`.
To change or disable the compression, we need to modify this argument.
Additionally, we can save our raster object as COG (*Cloud Optimized GeoTIFF*, Section \@ref(file-formats)) with the `"of=COG"` option.
```{r 07-read-write-plot-35, eval=FALSE}
writeRaster(x = single_layer,
filename = "my_raster.tif",
datatype = "INT2U",
gdal = c("COMPRESS=NONE", "of=COG"),
overwrite = TRUE)
```
## Visual outputs
\index{map making!outputs}
R supports many different static and interactive graphics formats.
The most general method to save a static plot is to open a graphic device, create a plot, and close it, for example:
```{r 07-read-write-plot-36, eval=FALSE}
png(filename = "lifeExp.png", width = 500, height = 350)
plot(world["lifeExp"])
dev.off()
```
Other available graphic devices include `pdf()`, `bmp()`, `jpeg()`, and `tiff()`.
You can specify several properties of the output plot, including width, height and resolution.
Additionally, several graphic packages provide their own functions to save a graphical output.
For example, the **tmap** package has the `tmap_save()` function.
You can save a `tmap` object to different graphic formats or an HTML file by specifying the object name and a file path to a new file.
```{r 07-read-write-plot-37, eval=FALSE}
library(tmap)
tmap_obj = tm_shape(world) + tm_polygons(col = "lifeExp")
tmap_save(tmap_obj, filename = "lifeExp_tmap.png")
```
On the other hand, you can save interactive maps created in the `mapview` package as an HTML file or image using the `mapshot()` function:
```{r 07-read-write-plot-38, eval=FALSE}
library(mapview)
mapview_obj = mapview(world, zcol = "lifeExp", legend = TRUE)
mapshot(mapview_obj, file = "my_interactive_map.html")
```
## Exercises
```{r, echo=FALSE, results='asis'}
res = knitr::knit_child('_07-ex.Rmd', quiet = TRUE, options = list(include = FALSE, eval = FALSE))
cat(res, sep = '\n')
```