Skip to content

Delivering data services to EOSC

Sophie Servan edited this page Apr 7, 2021 · 17 revisions

How is ExPaNDS delivering its data services to PaNOSC and EOSC

Report from our workshop on EOSC is being published here.

Table of contents:
🚧

OAI-PMH and its implementation for PaN data catalogues

OAI-PMH

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low barrier mechanism for repository interoperability.

  • OAI-PMH: is a set of six verbs or services that are invoked within HTTP
  • Data Providers: are repositories that expose structured metadata via OAI-PMH. A network accessible server that can process the six OAI-PMH requests. (our facilities which implemented the OAI-PMH)
  • Service Providers: then make OAI-PMH service requests to harvest that metadata. (DataCite, B2FIND, OpenAIRE…)

It enables to expose the metadata you have in your data catalogue in a standard way, and represents an easy way to integrate in B2FIND and OpenAIRE. It uses a predefined set of urls that can be invoked.

Simplified diagram of what an OAI-PMH is:
Credits to Carlo

Definitions:

  • Resource: data collected during an experiment
  • Item: all metadata around the data (entry in a collection)
  • Record: item in the standardised XML format

List of the 6 OAI-PMH verbs and their meaning:

GetRecord

http://<end-point-address>/oai?verb=GetRecord&metadataPrefix=[Metadata_Prefix]&identifier=[Identifier]

Identify

http://<end-point-address>/oai?verb=Identify

ListIdentifiers

http://<end-point-address>/oai?verb=ListIdentifiers&metadataPrefix=[Metadata_Prefix]

ListMetadataFormats

http://<end-point-address>/oai?verb=ListMetadataFormats

ListRecords

http://<end-point-address>/oai?verb=ListRecords&metadataPrefix=[Metadata_Prefix]

ListSets

http://<end-point-address>/oai?verb=ListSets

  • Returns list of available Sets.
  • Currently not supported by the SciCat implementation.

The service provider then does a scheduled harvesting of the metadata using these urls.

Status of ExPaNDS facilities towards a reachable OAI-PMH endpoint

Facility Data catalogue Status of OAI-PMH endpoint Link
ALBA ICAT In progress - next weeks
DESY Custom Not planned yet
Diamond ICAT In progress - target date?
Elettra
HZB ICAT In progress - target date?
HZDR
MAX IV SciCat In place but no data yet
PSI SciCat In place since 03.2021 https://doi.psi.ch/oaipmh/oai?verb=Identify
SOLEIL
ISIS

SciCat implementation

Credits to Carlo

From the SciCat datacalogue to SciCat OAI-PMH exposure

Explaining the diagram above:

  1. Using catanie the user decides what publication to publish.

Note: Having the SciCat front-end (catanie) is not required, catamel endpoints can be called in any other way.

  1. The /PublishedData/{id}/register endpoint of catamel is called, which draws the aforementioned publication by ID from the datacalogue MongoDB.
  2. The /Scicat/oai/Publication endpoint of the SciCat OAI-PMH is called, which copies the publication of (2) to the SciCat OAI-PMH mongoDB.

Note: The SciCat OAI-PMH MongoDB can be the same as the datacalogue MongoDB, this is an implementation choice.

  1. Depending on the need, one of the endpoints /Scicat/oai?verb=OAI-PMH_verb, /openaire/oai?verb=OAI-PMH_verb, /panosc/oai?verb=OAI-PMH_verb is invoked. Each of them draws the publication(s) of interest from the SciCat OAI-PMH mongoDB and formats it (them) in the corresponding XML format.

Note: The XML formatting depends on the route you set. Currently three different formats are implemented:

  • OpenAIRE compatible - /openaire/ route - (datacite XML format)
  • B2FIND compatible - /Scicat/ route - (dc XML format) and
  • panosc - /panosc/ route - (don’t know of any use).

In case a new formatting will be required, e.g. DCAT, the SciCat OAI-PMH architecture requires to define a new XML mapper class to be added to the existing mappers and to expose it defining a new route. The XML formatting is thus quite flexible and scalable.

  1. Such XML formatted publications can be consumed by B2FIND, OpenAIRE, etc. via their scheduled harvesting functionality.

Step by step deployment of the SciCat OAI-PMH:

  1. Clone the SciCat OAI-PMH repository.
  2. Customise the mongo connection parameters here to point to the mongoDB instance to use.
  3. Add the key value pair: oaiProviderRoute: '$url:$port/Scicat/oai/Publication' to the module.exports map in the config.local.js file of your catamel service deployment, using the url and port from (2).
  4. Run npm install+node or build the Dockerfile image and run it.
  5. Restart the catamel service.

ICAT implementation

🚧