Skip to content

Latest commit

 

History

History
136 lines (88 loc) · 3.45 KB

ep3util.1.md

File metadata and controls

136 lines (88 loc) · 3.45 KB

%ep3util(1) irdmtools user manual | version 0.0.89 5850ad61 % R. S. Doiel and Tom Morrell % 2024-10-03

NAME

ep3util

SYNOPSIS

ep3util [OPTIONS] ACTION [ACTION_PARAMETERS ...]

DESCRIPTION

ep3util provides a quick wrapper around EPrints 3.3 REST API. By default ep3util looks for five environment variables.

REPO_ID : the EPrints repository id (name of database and archive subdirectory).

EPRINT_HOST : the hostname for EPrint's.

EPRINT_USER : the username having permissions to access the EPrint REST API.

EPRINT_PASSWORD : the password for the username with access to the EPrint REST API.

C_NAME : If harvesting the dataset collection name to harvest the records to.

EPRINT_DB_HOST : The MySQL hostname holding the EPrints repository database

EPRINT_DB_USER : The MySQL username used to access EPrints repository database

EPRINT_DB_PASSWORD : The MySQL password used to access EPrints repository database

The environment provides the default values for configuration. They maybe overwritten by using a JSON configuration file. The corresponding attributes are "repo_id", "eprint_host", "c_name", "eprint_db_host", "eprint_db_user", and "eprint_db_password".

If the environment variables for MySQL access are set then the results reflect direct access to the database instead of the EPrint REST API.

OPTIONS

help : display help

license : display license

version : display version

config : provide a path to an alternate configuration file (e.g. "irdmtools.json")

ACTION

ep3util supports the following actions.

setup : Display an example JSON setup configuration file, if it already exists then it will display the current configuration file. No optional or required parameters. When displaying the JSON configuration a placeholder will be used for the token value.

get_all_ids : Returns a list of all repository record ids. The method uses OAI-PMH for id retrieval. It is rate limited and will take come time to return all record ids. A test instance took 11 minutes to retrieve 24000 record ids.

get_modified_ids START [END] : Return a list of records created or modified in the START and END date range. If END is not provided it is assume to be today.

get_record RECORD_ID : Returns a specific simplified record indicated by RECORD_ID, e.g. 23808. The RECORD_ID is a required parameter.

harvest [HARVEST_OPTIONS] [KEY_LIST_JSON] : harvest takes a JSON file containing a list of keys and harvests each record into a dataset collection. If combined with one of the options, e.g. -all, you can skip providing the KEY_LIST_JSON file.

HARVEST_OPTIONS

-all : Harvest all records

-modified START [END] : Harvest records modified between start and end dates.

-as-citations : This harvests the record into a minimal citation form similar to citeproc

ACTION_PARAMETERS

Action parameters are the specific optional or required parameters need to complete an aciton.

EXAMPLES

Setup for ep3util by writing an example JSON configuration file. "nano" is an example text editor program, you need to edit the sample configuration appropriately.

ep3util setup >eprinttools.json
nano eprinttools.json

Get a list of all EPrint record ids.

ep3util get_all_ids

Get a specific EPrint record. Record is validated against irdmtool EPrints data model.

ep3util get_record 23808

Harvest all records

ep3util harvest -all

Harvest records created or modified in the month of September, 2023.

ep3util harvest -modified 2023-09-01 2023-09-30