Extra R-language tools to supplement biooracler R package. This package serves up R scripts to create a local data repository.
From CRAN
# install.packages(remotes)
remotes::install_github("BigelowLab/biooracle")
You can store the path to your chosen data directory. It will persist between R sessions so you don’t have to do it each time.
suppressPackageStartupMessages({
library(biooracle)
library(dplyr)
})
set_biooracle_root("~/Library/CloudStorage/Dropbox/data/biooracle")
We’ll be creating a new local dataset for the Northwest Atlantic (nwa)
which has the bounding box
{r} bb = c(xmin = -77, xmax = -42.5, ymin = 36.5, ymax = 56.7)
.
nwa_path = biooracle_path("nwa") |> make_path()
biooracle_path() |> dir(full.names = TRUE)
## [1] "/Users/ben/Library/CloudStorage/Dropbox/data/biooracle/nwa"
## [2] "/Users/ben/Library/CloudStorage/Dropbox/data/biooracle/temp"
We’ll set that aside for right now and fetch some data for that region, but note that this downloaded as a NetCDF file in a temporary directory. Keep in mind that we are specifying the bounding box with a vector of the corners, but we can also provide any object from which a bounding box can be determined using the sf package, such as a polygon, raster or collection of points.
dataset_id = "thetao_ssp119_2020_2100_depthmin"
newfile = fetch_biooracle(dataset_id,
bb = c(xmin = -77, xmax = -42.5, ymin = 36.5, ymax = 56.7))
NOTE that you can make subselections of variable and times to
download. See ?fetch_biooracle
for the details.
Now we can read the file.
x = stars::read_stars(newfile, quiet = TRUE)
x
## stars object with 3 dimensions and 7 attributes
## attribute(s), summary of first 1e+05 cells:
## Min. 1st Qu. Median Mean 3rd Qu.
## thetao_ltmax [°C] -0.36475005 1.96057870 2.3428427 2.9704422 3.3483083
## thetao_ltmin [°C] -1.90398343 1.57293375 1.8961157 1.7156003 2.3728029
## thetao_max [°C] 0.21924963 2.17144280 2.7790994 3.7453738 4.5603769
## thetao_mean [°C] -0.72299476 1.80542342 2.0918568 2.2443568 2.7558258
## thetao_min [°C] -2.00000000 0.91433330 1.6623332 1.1105045 2.0303362
## thetao_range [°C] 0.22811718 0.47419300 0.7938143 2.6947997 4.0197336
## thetao_sd [°C] 0.03341453 0.09708313 0.1335515 0.2456717 0.4161988
## Max. NA's
## thetao_ltmax [°C] 17.8440917 53161
## thetao_ltmin [°C] 6.0058540 53161
## thetao_max [°C] 21.4847577 53161
## thetao_mean [°C] 6.8363208 53161
## thetao_min [°C] 4.7977722 53161
## thetao_range [°C] 23.8074380 53161
## thetao_sd [°C] 0.7385261 53161
## dimension(s):
## from to offset delta refsys values x/y
## x 1 691 -77 0.05 NA NULL [x]
## y 1 405 56.75 -0.05 NA NULL [y]
## time 1 8 NA NA POSIXct 2020-01-01,...,2090-01-01
We often save the data in a directory structure aong with a simple table
that catalogs the contents of the directory. The archive_biooracle()
function will split up the fecthed data and save in a logical data
structure. We provide the data path, in this case for the Northwest
Atlantic (nwa).
archive_biooracle(newfile, path = nwa_path)
## # A tibble: 56 × 5
## scenario year z param trt
## <chr> <chr> <chr> <chr> <chr>
## 1 ssp119 2020 depthmin thetao ltmax
## 2 ssp119 2020 depthmin thetao ltmin
## 3 ssp119 2020 depthmin thetao max
## 4 ssp119 2020 depthmin thetao mean
## 5 ssp119 2020 depthmin thetao min
## 6 ssp119 2020 depthmin thetao range
## 7 ssp119 2020 depthmin thetao sd
## 8 ssp119 2030 depthmin thetao ltmax
## 9 ssp119 2030 depthmin thetao ltmin
## 10 ssp119 2030 depthmin thetao max
## # ℹ 46 more rows
Alternatively, it is possible to fetch and archive in one step, and this is likely the most convenient usage.
newfile = fetch_biooracle(dataset_id,
bb = c(xmin = -77, xmax = -42.5, ymin = 36.5, ymax = 56.7),
archive = TRUE,
data_dir = nwa_path)
Once you have established a database of files, your can read the database catalog.
db = read_database(nwa_path) |>
print()
## # A tibble: 56 × 5
## scenario year z param trt
## <chr> <chr> <chr> <chr> <chr>
## 1 ssp119 2020 depthmin thetao ltmax
## 2 ssp119 2020 depthmin thetao ltmin
## 3 ssp119 2020 depthmin thetao max
## 4 ssp119 2020 depthmin thetao mean
## 5 ssp119 2020 depthmin thetao min
## 6 ssp119 2020 depthmin thetao range
## 7 ssp119 2020 depthmin thetao sd
## 8 ssp119 2030 depthmin thetao ltmax
## 9 ssp119 2030 depthmin thetao ltmin
## 10 ssp119 2030 depthmin thetao max
## # ℹ 46 more rows
You can use a portion of the database to read in a stars
object. Keep
in mind that if you are reading multiple over multiple decades, then
each variable must have the same number of time steps.
x = db |>
dplyr::mutate(year = as.numeric(year)) |>
dplyr::filter(year >= 2070) |>
read_biooracle(, path = nwa_path) |>
print()
## stars object with 3 dimensions and 7 attributes
## attribute(s):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## thetao_ltmax -0.64297539 1.89211679 2.1213403 4.1633015 4.1325927 31.248983
## thetao_ltmin -2.00000000 1.80401134 1.8825923 2.4666165 2.7054288 12.447174
## thetao_max 0.46041700 1.94233787 2.4119213 5.0367358 5.3864827 33.908306
## thetao_mean -1.06481194 1.86144698 1.9767619 3.2299667 3.4484773 18.037924
## thetao_min -2.00000000 1.41676116 1.8234118 1.7105341 1.9514574 9.975613
## thetao_range 0.04843726 0.12690720 0.6352606 3.3517626 5.0061388 33.088707
## thetao_sd 0.02830037 0.06138093 0.2785121 0.4090969 0.6726204 3.425986
## NA's
## thetao_ltmax 287604
## thetao_ltmin 287604
## thetao_max 287604
## thetao_mean 287604
## thetao_min 287604
## thetao_range 287604
## thetao_sd 287604
## dimension(s):
## from to offset delta x/y
## x 1 691 -77 0.05 [x]
## y 1 405 56.75 -0.05 [y]
## time 1 3 2070 10
And of course you can plot.
plot(x['thetao_mean'])
## downsample set to 1