The Datahub is a metadata aggregator. This application allows data providers to aggregate and publish metadata describing objects on the web through a RESTful API leveraging standardized exchange formats.
The Datahub is build with the Symfony framework and MongoDB.
- A RESTful API which supports:
- Ingest and retrieval of individual metadata records.
- Validation of ingested records against XSD schemas.
- Supports OAuth to restrict access to the API.
- An OAI-PMH endpoint for harvesting metadata records.
- Includes support for LIDO XML but can be extended to include MARC XML, Dublin Core or other formats.
This project requires following dependencies:
- PHP = 5.6.* or 7.0.*
- With the php-cli, php-intl, php-mbstring and php-mcrypt extensions.
- The PECL Mongo (PHP5) or PECL Mongodb (PHP7) extension. Note that the mongodb extension must be version 1.2.0 or higher. Notably, the package included in Ubuntu 16.04 (php-mongodb) is only at 1.1.5.
- MongoDB >= 3.2.10
Via Git:
$ git clone https://github.com/thedatahub/Datahub.git datahub
$ cd datahub
$ composer install # Composer will ask you to fill in any missing parameters
before it continues
You will be asked to configure the connection to your MongoDB database. You will need to provide these details:
- The connection to your MongoDB instance (i.e. mongodb://127.0.0.1:27017)
- The username of the user (i.e. datahub)
- The password of the user
- The database where your data will persist (i.e. datahub)
Before you install, ensure that you have a running MongoDB instance, and you have created a user with the right permissions. From the [Mongo shell]https://docs.mongodb.com/getting-started/shell/client/) run these commands to create the required artefacts in MongoDB:
> use datahub
> db.createUser(
{
user: "datahub",
pwd: "password",
roles: [ "readWrite", "dbAdmin" ]
}
)
The configuration parameters will be stored in app/config/parameters.yml
.
You'll need to run an initiial one-time setup script, which will scaffold the
database structure, generate CSS assets and create the application 'admin' user.
$ app/console app:setup
$ app/console doctrine:mongodb:fixtures:load --append
If you want to run the datahub for testing or development purposes, execute this command:
$ app/console server:run
Use a browser and Navigate to http://127.0.0.1:8000. You should now see the welcome screen.
Refer to the Symfony setup documentation to complete your installation using a fully featured web server to make your installation operational in a production environment.
The application is installed with as default username admin
and as default password datahub
. Changing this is highly recommended.
The REST API is available at api/v1/data
. Documentation about the available
API methods can be found at /docs/api
.
The PUT and POST actions expect and XML formatted body in the HTTP request.
The Content-Type HTTP request header also needs to be set accordingly.
Currently, supported: application/lido+xml
. Finally, you will need to add a
valid OAuth token via the access_token
query parameter.
A valid POST HTTP request looks like this:
POST /api/v1/data?access_token=MThmYWMxMjFlZWZmYjVmZDU2NDNmZWIzYTE0YmNiYTk3YTc5ODJmMWJjOGI1MjE5MWY4ZjEyZWZlZmM2ZmZmNg HTTP/1.1
Host: example.org
Content-Type: application/lido+xml
Cache-Control: no-cache
<?xml version="1.0" encoding="UTF-8"?>
<lido:lido xmlns:lido="http://www.lido-schema.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.lido-schema.org http://www.lido-schema.org/schema/v1.0/lido-v1.0.xsd">
<lido:lidoRecID lido:source="Deutsches Dokumentationszentrum für Kunstgeschichte - Bildarchiv Foto Marburg" lido:type="local">DE-Mb112/lido-obj00154983</lido:lidoRecID>
<lido:category>
...
Sending a GET HTTP request to the api/v1/data
endpoint will return a
paginated list of all the records available in the API. The endpoint will
return a HTTP response with a JSON formatted body. The endpoint respects the
HATEOAS constraint.
Content negotation is currently only supported via a file extension on individual resource URL's. Negotation via the HTTP Accept header is on the roadmap.
GET api/v1/data # only JSON supported
GET api/v1/data/objectPID # return JSON
GET api/v1/data/objectPID.xml # return XML
The datahub supports the OAI-PMH protocol.
The endpoint is available via the /oai
path.
GET oai/?metadataPrefix=oai_lido&verb=ListIdentifiers
GET oai/?metadataPrefix=oai_lido&verb=ListSets
GET oai/?metadataPrefix=oai_lido&verb=ListRecords
GET oai/?metadataPrefix=oai_lido&verb=ListRecords&metadataPrefix=oai_lido&set=creator:james_ensor
GET oai/?metadataPrefix=oai_lid&verb=GetRecord&metadataPrefix=oai_lido&identifier=objectPID
GET oai/?metadataPrefix=oai_lido&verb=ListIdentifiers&metadataPrefix=oai_lido&from=2017-06-29T05:22:30Z&until=2017-07-14T04:22:30Z
The datahub implements grouping of records into sets, but no soft deletes. As such, the OAI endpoint doesn't indicate whether a record has been deleted.
The datahub API can be set up to be either a public or a private API. The
public_api_method_access
parameter in parameters.yml
allows you to
configure which parts of the API are public or private:
# Setting this to some unknown value like [FOO] disables public api access
# Leaving this option empty [] means allowing all methods for anonymous access
# public_api_method_access: [FOO]
public_api_method_access: [GET]
The datahub requires OAuth authentication to ingest or retrieve metadata records. The administrator has to issue a user account with a client_id and a client_secret to individual Users or client applications. Before clients can access the API, they have to request an access token:
curl 'http://localhost:8000/oauth/v2/token?grant_type=password&username=admin&password=datahub&client_id=slightlylesssecretpublicid&client_secret=supersecretsecretphrase'
Example output:
{
"access_token": "ZDIyMGFiZGZkZWUzY2FjMmY4YzNmYjU0ODZmYmQ2ZGM0NjZiZjBhM2Q0Y2ZjMGNiMjc0ZWIyMmYyODMzMGJjZg",
"expires_in": 3600,
"token_type": "bearer",
"scope": "internal web external",
"refresh_token": "MzhkYzY0MzMxM2FmNmQyODhiOWM4YzEzZjI3YzViZjg3ZThlMTA2YWY4ZTc2YjUwYzgxNzVhNTlmYTBkYWZhNQ"
}
The endpoint can also be used to revoke both access and refresh tokens.
curl 'http://localhost:8000/oauth/v2/revoke?token=ZDIyMGFiZGZkZWUzY2FjMmY4YzNmYjU0ODZmYmQ2ZGM0NjZiZjBhM2Q0Y2ZjMGNiMjc0ZWIyMmYyODMzMGJjZg'
Example output:
{
"result": "success",
"message": "The token has been revoked."
}
Please see CHANGELOG for more information what has changed recently.
Testing will require a MongoDB instance, as well as Catmandu installed. You can either take care of this yourself, or run the tests using the provided Docker container.
Please ensure you've taken care of the initial setup described above before attempting to run the tests.
Running tests:
./scripts/run_tests
Running tests using Docker:
./scripts/run_tests_docker
Front end workflows are managed via yarn and webpack-encore.
The layout is based on Bootstrap 3.3
and managed via sass. The code can be found under app/resources/public/sass
.
Javascript files can be found under app/resources/public/js
. Dependencies are
managed via yarn
. Add vendor modules using require
.
Files are build and stored in web/build
and included in app/views/app/base.html.twig
via the asset()
function.
The workflow configuration can be found in webpack.config.js
.
Get started:
# Install all dependencies
$ yarn install
# Build everything in development
$ yarn run encore dev
# Watch files and build automatically
$ yarn run encore dev --watch
# Build for production
$ yarn run encore production
Please see CONTRIBUTING for details.
The Datahub is copyright (c) 2016 by Vlaamse Kunstcollectie vzw and PACKED vzw.
This is free software; you can redistribute it and/or modify it under the terms of the The GPLv3 License (GPL). Please see License File for more information.