This repository contains the functionality to standardize the data of the European Seabirds at Sea (ESAS) to a Darwin Core Archive that can be harvested by OBIS and GBIF.
To republish the data:
- Clone this repository to your computer.
- Download all public ESAS data from ICES.
- Unzip the download and move the files to the repository in a
data/raw
directory. The directory (and the files it contains) is ignored by git, so you will have to create it. - Open the repository in RStudio by opening the
esas2obis.Rproj
file. - Open the Darwin Core mapping script
dwc_mapping.Rmd
. - Click
Run > Run All
to transform the data to Darwin Core files using SQL. This will take a while. - Verify that all steps in the the mapping script ran without errors.
- Verify in git or GitHub Desktop that the sample data are not affected (changes would indicate updates or issues in the mapping).
- Upload the Darwin Core files to the VLIZ "upload" IPT.
- Validate the Darwin Core Archive (by EurOBIS staff).
- Publish the dataset to OBIS and GBIF (by EurOBIS staff).
- Dataset on IMIS: source for the metadata and landing page for the DOI (https://doi.org/10.14284/601)
- Dataset on the VLIZ "upload" IPT: source for the data
- Dataset on OBIS
- Dataset on GBIF
ESAS data is structured in 4 hierarchical tables: campaigns, samples, positions and observations.
The Event core contains three types of events:
- Campaigns (
type=cruise
) with aneventID
, date range, and remarks. - Samples (
type=sample
) with aneventID
,parentEventID
(the campaign), single date and remarks. - Positions (
type=subSample
) with aneventID
,parentEventID
(the sample), datetime and location.
The eventID
s are created by concatenating the parent identifiers, e.g. <campaignID>_<sampleID>_<positionID>
for a position. This makes them unique within the dataset and easy to understand.
Record-level terms such as institutionCode
, datasetName
, license
and rightsHolder
are included as well.
See the SQL file for the full transformation.
The Occurrence extension contains the observations, with the following terms:
eventID
(the position) andoccurrenceID
.basisOfRecord
(alwaysHumanObservation
) andoccurrenceStatus
(alwayspresent
).scientificName
,scientificNameID
(WoRMS identifier),kingdom
(alwaysAnimalia
) andvernacularName
.individualCount
,sex
,lifeStage
,behavior
,associatedTaxa
(also expressed as measurements or facts).occurrenceRemarks
.
The occurrenceID
s are created similarly to the eventID
s, as <campaignID>_<sampleID>_<positionID>_<observationID>
.
See the SQL file for the full transformation.
The EMOF extension contains all other ESAS data, with the following terms:
eventID
: identifier of sample or position (there are no campaign measurements).occurrenceID
(where applicable): identifier of the occurrence.measurementType
: lowercase description of the measurement.measurementTypeID
(where applicable): link to a definition of the measurement. Where possible, we use the BODC Parameter Usage Vocabulary (P01) or fall back to ESAS vocabularies maintained by ICES (e.g. https://vocab.ices.dk/services/rdf/collection/UseOfBinoculars).measurementValue
: human readable value or description, lowercased where appropriate.measurementValueID
(where applicable): IRI for the value. These mostly link to values in ESAS vocabularies maintained by ICES (e.g. https://vocab.ices.dk/services/rdf/collection/UseOfBinoculars/2), except for platform code (C17), sex (S10) and life stage (S11).measurementUnit
(where applicable): unit of the measurement.measurementUnitID
: link to a definition of the unit, with XXXX for not applicable and UUUU for dimensionless (e.g.individualCount
).
The ESAS terms behaviour
and association
can contain multiple values for a single observation and are split into maximum 3 measurements or facts records.
See Table 1 for an overview and the SQL file for the full transformation.
table | measurement or fact | type | example |
---|---|---|---|
sample | platform code | vocab | BELGICA |
sample | platform class | vocab | ship |
sample | platform side | vocab | left |
sample | platform height | number | |
sample | transect width | integer | 300 |
sample | sampling method | vocab | ship-based transect method with distance estimation and snapshot for flying birds |
sample | primary sampling | boolean | True |
sample | target taxa | vocab | all species recorded (standard) |
sample | distance bins | string | 0|50|100|200|300 |
sample | use of binoculars | vocab | Binoculars used extensively for scanning ahead and to the side, naked eye used for close observations (e.g. for cetacean monitoring) |
sample | number of observers | integer | 2 |
position | distance | number | 0.7 |
position | area | number | 0.21 |
position | wind force | vocab | moderate breeze |
position | visibility | vocab | C |
position | glare | vocab | weak |
position | sun angle | integer | |
position | cloud cover | vocab | |
position | precipitation | vocab | none |
position | ice cover | integer | 0 |
position | observation conditions | vocab | |
observation | group identifier | string | 12 |
observation | in transect | boolean | True |
observation | individual count | integer | 1 |
observation | observation distance | vocab | 100-200 |
observation | life stage | vocab | adult |
observation | moult | vocab | active primary moult |
observation | plumage | vocab | non-breeding (winter) plumage |
observation | sex | vocab | female |
observation | travel direction | vocab | 45 |
observation | prey | vocab | medium fish, unidentified (ca. 2-5x bill length) |
observation | association x 3 | vocab | associated with observation base |
observation | behaviour x 3 | vocab | scavenging |
The repository structure is based on Cookiecutter Data Science and the Checklist recipe. Files and directories indicated with GENERATED
should not be edited manually.
├── README.md : Description of this repository
├── LICENSE : Repository license
├── esas2obis.Rproj : RStudio project file
├── .gitignore : Files and directories to be ignored by git
│
├── src
│ └── dwc_mapping.Rmd : Darwin Core mapping script
|
├── sql : Darwin Core transformations
│ ├── dwc_event.sql
│ ├── dwc_occurrence.sql
│ └── dwc_mof.sql
|
└── data
├── processed : Darwin Core output of mapping script GENERATED
└── processed_sample : Darwin Core sample output of mapping script for git comparison GENERATED
MIT License for the code and documentation in this repository. The included data is released under another license.