-
Notifications
You must be signed in to change notification settings - Fork 7
/
README.Rmd
205 lines (157 loc) · 7.97 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
---
output: github_document
---
<!-- badges: start -->
[![R-CMD-check](https://github.com/mapme-initiative/mapme.biodiversity/actions/workflows/R-CMD-check.yaml/badge.svg?branch=main)](https://github.com/mapme-initiative/mapme.biodiversity/actions)
[![Coverage Status](https://img.shields.io/codecov/c/github/mapme-initiative/mapme.biodiversity/master.svg)](https://app.codecov.io/github/mapme-initiative/mapme.biodiversity?branch=main)
[![CRAN status](https://badges.cranchecks.info/worst/mapme.biodiversity.svg)](https://cran.r-project.org/web/checks/check_results_mapme.biodiversity.html)
[![CRAN version](https://www.r-pkg.org/badges/version/mapme.biodiversity)](https://CRAN.R-project.org/package=mapme.biodiversity)
[![License](https://img.shields.io/badge/License-GPL (>=3)-brightgreen.svg?style=flat)](https://choosealicense.com/licenses/gpl-3.0/)
<!-- badges: end -->
# mapme.biodiversity <img src="man/figures/logo.png" align="right" height="110"/>
## About
Biodiversity areas, especially primary forests, provide multiple
ecosystem services for the local population and the planet as a whole.
The rapid expansion of human land use into natural ecosystems and the
impacts of the global climate crisis put natural ecosystems and the
global biodiversity under threat.
The mapme.biodiversity package helps to analyse a number of biodiversity
related indicators and biodiversity threats based on freely available
geodata-sources such as the Global Forest Watch. It supports
computational efficient routines and heavy parallel computing in
cloud-infrastructures such as AWS or Microsoft Azure using in the statistical
programming language R. The package allows for the analysis of global
biodiversity portfolios with a thousand or millions of AOIs which is
normally only possible on dedicated platforms such as the Google Earth
Engine. It provides the possibility to e.g. analyse the World Database
of Protected Areas (WDPA) for a number of relevant indicators. The
primary use case of this package is to support scientific analysis and
data science for individuals and organizations who seek to preserve the
planet biodiversity. Its development is funded by the German
Development Bank KfW.
## Installation
The package and its dependencies can be installed from CRAN via:
```{r install-cran, eval = FALSE}
install.packages("mapme.biodiversity", dependencies = TRUE)
```
To install the development version, use the following command:
```{r install-devel, eval = FALSE}
remotes::install_github("https://github.com/mapme-initiative/mapme.biodiversity", dependencies = TRUE)
```
## Available resources and indicators
Below is a list of the resources currently supported by `mapme.biodiversity`.
```{r available_resources, echo = FALSE}
knitr::kable(mapme.biodiversity::available_resources()[, c("name", "description", "licence")])
```
Next, is a list of supported indicators.
```{r available_indicators, echo = FALSE}
knitr::kable(mapme.biodiversity::available_indicators()[, c("name", "description")])
```
## Usage example
`{mapme.biodiversity}` works by constructing a portfolio from an sf
object. Specific raster and vector resource matching the spatio-temporal
extent of the portfolio are made available locally. Once all required
resources are available, indicators can be calculated individually for
each asset in the portfolio.
```{r resources}
library(mapme.biodiversity)
library(sf)
```
Once you have decided on an indicator you are interested in, you can
start by making the required resource available for your portfolio.
Using `mapme_options()` you can set an output directory, control the maximum
size of polygons before they are chunked into smaller parts, and control the
verbosity of the package.
A portfolio is represented by an sf-object. It is required for the object to
only contain geometries of type `POLYGON` and `MULTIPOLYGON` as assets.
We can request the download of a resource for the spatial extent of our portfolio
by using the `get_resources()` function. We simply supply our portfolio and one
or more resource functions. Once the resources were made available, we can query
the calculation of an indicator by using the `calc_indicators()` function. This
function also expects the portfolio as input and one or more indicator functions.
Once the indicator has been calculated for all assets in a portfolio, the data
is returned as a nested list column to the original portfolio object. The
output of each indicator is standardized to common format, consisting of a
tibble with columns `datetime`, `variable`, `unit`, and `value`. We can
transform the the data to long format by using `portfolio_long()`.
```{r calculation}
mapme_options(
outdir = system.file("res", package = "mapme.biodiversity"),
chunk_size = 1e6, # in ha
verbose = FALSE
)
aoi <- system.file("extdata", "sierra_de_neiba_478140_2.gpkg", package = "mapme.biodiversity") %>%
sf::read_sf() %>%
get_resources(
get_gfw_treecover(version = "GFC-2023-v1.11"),
get_gfw_lossyear(version = "GFC-2023-v1.11"),
get_gfw_emissions()
) %>%
calc_indicators(calc_treecover_area_and_emissions(years = 2016:2017, min_size = 1, min_cover = 30)) %>%
portfolio_long()
aoi
```
## Using cloud storages
`{mapme.biodiversity}` leverages GDAL's capabilities for data I/O. For users of
this package, that means that integrating a cloud storage is as easy as setting
up a configuration file and changing the `outdir` argument in `mapme_options()`.
While you could also decide to use environment variables, we recommend to set
up a GDAL config file. You can find GDAL's documentation on this topic [here](https://gdal.org/en/latest/user/configoptions.html#gdal-configuration-file).
Suppose that we want to use an AWS S3 bucket that we control to
write resource data to. Let's assume this bucket is already set up and we wish
to refer to it in our R code as `mapme-data`. The GDAL configuration file
should look something like this:
```{ini}
[credentials]
[.mapme-data]
path=/vsis3/mapme-data
AWS_SECRET_ACCESS_KEY=<your-access-key>
AWS_ACCESS_KEY_ID=<your-access-id>
```
The connection will be handled based on GDAL's virtual file system. You can
find documentation on specific options for your cloud provider [here](https://gdal.org/en/latest/user/virtual_file_systems.html#network-based-file-systems).
Ideally, you would also set the following in the `.Renviron` file in your user's
home directory to ensure that GDAL is aware of this configuration when an R
session is started:
```{ini}
GDAL_CONFIG_FILE = "<path-to-your-config-file>"
```
Then, in your scripts set the `outdir` option to the value specified with
the `path` variable in the configuration file:
```{r s3-outdir, eval = FALSE}
mapme_options(outdir = "/vsis3/mapme-data")
```
## A note on parallel computing
`{mapme.biodiversity}` follows the parallel computing paradigm of the
[`{future}`](https://cran.r-project.org/package=future) package. That means
that you as a user are in the control if and how you would like to set up
parallel processing. Since `{mapme.biodiversity} v0.9`, we apply pre-chunking
to all assets in the portfolio. That means that assets are split up into
components of roughly the size of `chunk_size`. These components can than be
iterated over in parallel to speed up processing. Indicator values will
be aggregated automatically.
```{r parallel-1, eval = FALSE}
library(future)
plan(cluster, workers = 6)
```
As another example, with the code below one would apply parallel processing of 2
assets, with each having 4 workers available to process chunks, thus requiring
a total of 8 available cores on the host machine. Be sure to not request
more workers than available on your machine.
```{r parallel, eval = FALSE}
library(progressr)
plan(cluster, workers = 2)
with_progress({
aoi <- calc_indicators(
aoi,
calc_treecover_area_and_emissions(
min_size = 1,
min_cover = 30
)
)
})
plan(sequential) # close child processes
```
Head over to the [online
documentation](https://mapme-initiative.github.io/mapme.biodiversity/index.html)
find more detailed information about the package.