Skip to content

Linked Data Reconciliation Services Breakdown

Ryan Johnson edited this page Oct 24, 2017 · 1 revision

Once we have a proper environment set up, we can clone and call a desired service when needed, then let OpenRefine know where to listen in on our local computer to use it.

FAST reconciliation

I have a forked version of the FAST reconciliation script called fast-reconcile.

  • Clone the repo.
  • In your shell while in the refine3 environment, cd in, and type:
$ python reconcile.py
  • The shell should report that the service is running and note which port, something like Running on http://0.0.0.0:5000/.
  • Open OpenRefine, select the column you would like to reconcile, click on the arrow at the top, choose Reconcile > Start Reconciling...
  • Click on the Add Standard Service button in the bottom left corner.
  • Now enter the URL that the local service is running on - it should be http://localhost:5000/reconcile
  • You should now be greeted by a list of possible reconciliation types for the service. Choose your desired options and then click Start Reconciling.
  • Whenever you are finished and wish to close the service down, hit Ctrl + C to stop it.

Note that this is the first-time set up. OpenRefine will save that service, but you must still beforehand activate the conda environment and run the script.

GeoNames

We will use Christina Harlow's service, geonames-reconcile.

The instructions are the same as above for FAST, except one crucial bit: it relies on a GeoNames API user name. So first:

  • Go to the login page and register. After your account is activated, enable it for free web services.
  • Once you have your GeoNames username, create an environment variable on your computer with your Geonames username as so:
    • Open your shell
    • Type in $ export GEONAMES_USERNAME="username" (replacing username with your username)
  • Proceed as above in the FAST reconciliation

Library of Congress (id.loc.gov)

We will use Christina Harlow's service lc-reconcile.

The instructions are the same as above for FAST, except the local URL to use when selecting Standard Service in OpenRefine is http://localhost:5000/.

Optionally, you could ignore this local version and run the hosted verion by putting instead the URL http://lc-reconcile.cmh2166.webfactional.com/.

VIAF (Note: not provided for in this repo)

Although there used to be a python-based VIAF reconciliation service, it has since moved to a much bigger Java-based framework called conciliator.

Conciliator has grown to provide reconciliation for way more than just VIAF, including ORCID, and any Solr data source.

Since Java set up is beyond the scope of this repo, read the manual, and use the hosted version of conciliator if you don't wish to install it locally.

Wikidata (Note: not provided for in this repo)

Wikidata reconciliation is quickly becoming a highly-desired ability, and there is a Wikidata Hosted Reconciliation Service. In order to use it, simply choose to reconcile a column in OpenRefine, then add the API endpoint as a "Standard Service": https://tools.wmflabs.org/openrefine-wikidata/en/api.

Update: In the latest releases of OpenRefine, 2.7+, you can now reconcile to Wikidata right out of the box. Neat!