biooracle

Extra R-language tools to supplement biooracler R package. This package serves up R scripts to create a local data repository.

Requirements for package

From CRAN

Installation

# install.packages(remotes)
remotes::install_github("BigelowLab/biooracle")

Set up a data directory

You can store the path to your chosen data directory. It will persist between R sessions so you don’t have to do it each time.

suppressPackageStartupMessages({
  library(biooracle)
  library(dplyr)
})
set_biooracle_root("~/Library/CloudStorage/Dropbox/data/biooracle")

We’ll be creating a new local dataset for the Northwest Atlantic (nwa) which has the bounding box {r} bb = c(xmin = -77, xmax = -42.5, ymin = 36.5, ymax = 56.7).

nwa_path = biooracle_path("nwa") |> make_path()
biooracle_path() |> dir(full.names = TRUE)

## [1] "/Users/ben/Library/CloudStorage/Dropbox/data/biooracle/nwa" 
## [2] "/Users/ben/Library/CloudStorage/Dropbox/data/biooracle/temp"

Fetch some data to a temporary directory

We’ll set that aside for right now and fetch some data for that region, but note that this downloaded as a NetCDF file in a temporary directory. Keep in mind that we are specifying the bounding box with a vector of the corners, but we can also provide any object from which a bounding box can be determined using the sf package, such as a polygon, raster or collection of points.

dataset_id = "thetao_ssp119_2020_2100_depthmin"
newfile = fetch_biooracle(dataset_id, 
                          bb = c(xmin = -77, xmax = -42.5, ymin = 36.5, ymax = 56.7))

NOTE that you can make subselections of variable and times to download. See ?fetch_biooracle for the details.

Now we can read the file.

x = stars::read_stars(newfile, quiet = TRUE)
x

## stars object with 3 dimensions and 7 attributes
## attribute(s), summary of first 1e+05 cells:
##                          Min.    1st Qu.    Median      Mean   3rd Qu.
## thetao_ltmax [°C] -0.36475005 1.96057870 2.3428427 2.9704422 3.3483083
## thetao_ltmin [°C] -1.90398343 1.57293375 1.8961157 1.7156003 2.3728029
## thetao_max [°C]    0.21924963 2.17144280 2.7790994 3.7453738 4.5603769
## thetao_mean [°C]  -0.72299476 1.80542342 2.0918568 2.2443568 2.7558258
## thetao_min [°C]   -2.00000000 0.91433330 1.6623332 1.1105045 2.0303362
## thetao_range [°C]  0.22811718 0.47419300 0.7938143 2.6947997 4.0197336
## thetao_sd [°C]     0.03341453 0.09708313 0.1335515 0.2456717 0.4161988
##                         Max.  NA's
## thetao_ltmax [°C] 17.8440917 53161
## thetao_ltmin [°C]  6.0058540 53161
## thetao_max [°C]   21.4847577 53161
## thetao_mean [°C]   6.8363208 53161
## thetao_min [°C]    4.7977722 53161
## thetao_range [°C] 23.8074380 53161
## thetao_sd [°C]     0.7385261 53161
## dimension(s):
##      from  to offset delta  refsys                    values x/y
## x       1 691    -77  0.05      NA                      NULL [x]
## y       1 405  56.75 -0.05      NA                      NULL [y]
## time    1   8     NA    NA POSIXct 2020-01-01,...,2090-01-01

Archiving in a local database

We often save the data in a directory structure aong with a simple table that catalogs the contents of the directory. The archive_biooracle() function will split up the fecthed data and save in a logical data structure. We provide the data path, in this case for the Northwest Atlantic (nwa).

archive_biooracle(newfile, path = nwa_path)

## # A tibble: 56 × 5
##    scenario year  z        param  trt  
##    <chr>    <chr> <chr>    <chr>  <chr>
##  1 ssp119   2020  depthmin thetao ltmax
##  2 ssp119   2020  depthmin thetao ltmin
##  3 ssp119   2020  depthmin thetao max  
##  4 ssp119   2020  depthmin thetao mean 
##  5 ssp119   2020  depthmin thetao min  
##  6 ssp119   2020  depthmin thetao range
##  7 ssp119   2020  depthmin thetao sd   
##  8 ssp119   2030  depthmin thetao ltmax
##  9 ssp119   2030  depthmin thetao ltmin
## 10 ssp119   2030  depthmin thetao max  
## # ℹ 46 more rows

Alternatively, it is possible to fetch and archive in one step, and this is likely the most convenient usage.

newfile = fetch_biooracle(dataset_id, 
                          bb = c(xmin = -77, xmax = -42.5, ymin = 36.5, ymax = 56.7),
                          archive = TRUE,
                          data_dir = nwa_path)

Read the database catalog

Once you have established a database of files, your can read the database catalog.

db = read_database(nwa_path) |>
  print()

## # A tibble: 56 × 5
##    scenario year  z        param  trt  
##    <chr>    <chr> <chr>    <chr>  <chr>
##  1 ssp119   2020  depthmin thetao ltmax
##  2 ssp119   2020  depthmin thetao ltmin
##  3 ssp119   2020  depthmin thetao max  
##  4 ssp119   2020  depthmin thetao mean 
##  5 ssp119   2020  depthmin thetao min  
##  6 ssp119   2020  depthmin thetao range
##  7 ssp119   2020  depthmin thetao sd   
##  8 ssp119   2030  depthmin thetao ltmax
##  9 ssp119   2030  depthmin thetao ltmin
## 10 ssp119   2030  depthmin thetao max  
## # ℹ 46 more rows

Read in data from the database

You can use a portion of the database to read in a stars object. Keep in mind that if you are reading multiple over multiple decades, then each variable must have the same number of time steps.

x = db |>
  dplyr::mutate(year = as.numeric(year)) |>
  dplyr::filter(year >= 2070) |>
  read_biooracle(, path = nwa_path) |>
  print()

## stars object with 3 dimensions and 7 attributes
## attribute(s):
##                      Min.    1st Qu.    Median      Mean   3rd Qu.      Max.
## thetao_ltmax  -0.64297539 1.89211679 2.1213403 4.1633015 4.1325927 31.248983
## thetao_ltmin  -2.00000000 1.80401134 1.8825923 2.4666165 2.7054288 12.447174
## thetao_max     0.46041700 1.94233787 2.4119213 5.0367358 5.3864827 33.908306
## thetao_mean   -1.06481194 1.86144698 1.9767619 3.2299667 3.4484773 18.037924
## thetao_min    -2.00000000 1.41676116 1.8234118 1.7105341 1.9514574  9.975613
## thetao_range   0.04843726 0.12690720 0.6352606 3.3517626 5.0061388 33.088707
## thetao_sd      0.02830037 0.06138093 0.2785121 0.4090969 0.6726204  3.425986
##                 NA's
## thetao_ltmax  287604
## thetao_ltmin  287604
## thetao_max    287604
## thetao_mean   287604
## thetao_min    287604
## thetao_range  287604
## thetao_sd     287604
## dimension(s):
##      from  to offset delta x/y
## x       1 691    -77  0.05 [x]
## y       1 405  56.75 -0.05 [y]
## time    1   3   2070    10

And of course you can plot.

plot(x['thetao_mean'])

## downsample set to 1

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
R		R
README_files/figure-gfm		README_files/figure-gfm
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
biooracle.Rproj		biooracle.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

biooracle

Requirements for package

Installation

Set up a data directory

Fetch some data to a temporary directory

Archiving in a local database

Read the database catalog

Read in data from the database

About

Releases

Packages

Languages

License

BigelowLab/biooracle

Folders and files

Latest commit

History

Repository files navigation

biooracle

Requirements for package

Installation

Set up a data directory

Fetch some data to a temporary directory

Archiving in a local database

Read the database catalog

Read in data from the database

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages