-
Notifications
You must be signed in to change notification settings - Fork 6
Delivering data services to EOSC
Report from our workshop on EOSC is being published here.
Table of contents:
🚧
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low barrier mechanism for repository interoperability.
- OAI-PMH: is a set of six verbs or services that are invoked within HTTP
- Data Providers: are repositories that expose structured metadata via OAI-PMH. A network accessible server that can process the six OAI-PMH requests. (our facilities which implemented the OAI-PMH)
- Service Providers: then make OAI-PMH service requests to harvest that metadata. (DataCite, B2FIND, OpenAIRE…)
It enables to expose the metadata you have in your data catalogue in a standard way, and represents an easy way to integrate in B2FIND and OpenAIRE. It uses a predefined set of urls that can be invoked.
Simplified diagram of what an OAI-PMH is:
Credits to Carlo
Definitions:
- Resource: data collected during an experiment
- Item: all metadata around the data (entry in a collection)
- Record: item in the standardised XML format
List of the 6 OAI-PMH verbs and their meaning:
http://<end-point-address>/oai?verb=GetRecord&metadataPrefix=[Metadata_Prefix]&identifier=[Identifier]
- Returns the metadata record the
[Identifier]
points to, with the XML returned in the designated[Metadata_Prefix]
. - For example: https://doi.psi.ch/oaipmh/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=10.16907/f1285417-f190-4563-a8ee-04ebd9246a21
http://<end-point-address>/oai?verb=Identify
- Returns basic info about the repository.
- For example: https://doi.psi.ch/oaipmh/oai?verb=Identify
http://<end-point-address>/oai?verb=ListIdentifiers&metadataPrefix=[Metadata_Prefix]
- Returns a list of all record identifiers with date of last modification.
- For example: https://doi.psi.ch/oaipmh/oai?verb=ListIdentifiers&metadataPrefix=oai_dc
http://<end-point-address>/oai?verb=ListMetadataFormats
- Returns a list of all metadata formats available (with the
Metadata_Prefix
used in other queries). - For example: https://doi.psi.ch/oaipmh/oai?verb=ListMetadataFormats
http://<end-point-address>/oai?verb=ListRecords&metadataPrefix=[Metadata_Prefix]
- Returns all of the metadata records for the given prefix.
- For example: https://doi.psi.ch/oaipmh/oai?verb=ListRecords&metadataPrefix=oai_dc
http://<end-point-address>/oai?verb=ListSets
- Returns list of available Sets.
- Currently not supported by the SciCat implementation.
The service provider then does a scheduled harvesting of the metadata using these urls.
Facility | Data catalogue | Status of OAI-PMH endpoint | Link |
---|---|---|---|
ALBA | ICAT | In progress - next weeks | |
DESY | Custom | Not planned yet | |
Diamond | ICAT | In progress - target date? | |
Elettra | |||
HZB | ICAT | In progress - target date? | |
HZDR | |||
MAX IV | SciCat | In place but no data yet | |
PSI | SciCat | In place since 03.2021 | https://doi.psi.ch/oaipmh/oai?verb=Identify |
SOLEIL | |||
ISIS |
Credits to Carlo
Explaining the diagram above:
- Using catanie the user decides what publication to publish.
Note: Having the SciCat front-end (catanie) is not required, catamel endpoints can be called in any other way.
- The
/PublishedData/{id}/register
endpoint of catamel is called, which draws the aforementioned publication by ID from the datacalogue MongoDB. - The
/Scicat/oai/Publication
endpoint of the SciCat OAI-PMH is called, which copies the publication of (2) to the SciCat OAI-PMH mongoDB.
Note: The SciCat OAI-PMH MongoDB can be the same as the datacalogue MongoDB, this is an implementation choice.
- Depending on the need, one of the endpoints
/Scicat/oai?verb=OAI-PMH_verb
,/openaire/oai?verb=OAI-PMH_verb
,/panosc/oai?verb=OAI-PMH_verb
is invoked. Each of them draws the publication(s) of interest from the SciCat OAI-PMH mongoDB and formats it (them) in the corresponding XML format.
Note: The XML formatting depends on the route you set. Currently three different formats are implemented:
- OpenAIRE compatible -
/openaire/
route - (datacite XML format)- B2FIND compatible -
/Scicat/
route - (dc XML format) and- panosc -
/panosc/
route - (don’t know of any use).In case a new formatting will be required, e.g. DCAT, the SciCat OAI-PMH architecture requires to define a new XML mapper class to be added to the existing mappers and to expose it defining a new route. The XML formatting is thus quite flexible and scalable.
- Such XML formatted publications can be consumed by B2FIND, OpenAIRE, etc. via their scheduled harvesting functionality.
- Clone the SciCat OAI-PMH repository.
- Customise the mongo connection parameters here to point to the mongoDB instance to use.
- Add the key value pair:
oaiProviderRoute: '$url:$port/Scicat/oai/Publication'
to themodule.exports
map in theconfig.local.js
file of your catamel service deployment, using the url and port from (2). - Run
npm install+node
or build the Dockerfile image and run it. - Restart the catamel service.
🚧