ichk (iRODS consistency check) performs a consistency check between data object information in the iCAT catalog and files in unixfilesystem vaults on an iRODS server. It must be installed locally on the server.
It can run in three modes:
- In resource mode, a consistency check is performed for every registered data object on a local resource.
- In vault mode, a local unixfilesystem vault is scanned. A consistency check is performed for every file in the vault. This mode will detect files that are present in the vault, but not registered in the iCAT database, whereas such files are ignored in resource mode.
- In object list mode, a list of data objects is read from a file. A consistency check is performed for all local replicas of these data objects. This mode can be used to check whether an iRODS server has valid replicas of a particular set of data objects.
Resource mode and object list mode support both unixfilesystem (UFS) and S3 resources. Vault mode currently only supports unixfilesystem resources.
Ichk can use either a human-readable output format, or comma-separated values (CSV).
- iRODS >= v4.2.x
- Python 3.6+
This project contains a setup.py file which supports Python 3.6+ environments. Installation is easiest with pip. Just run the following commands:
python3 -m venv default
. default/bin/activate
pip3 install --upgrade pip
pip3 install git+https://github.com/UtrechtUniversity/irods-consistency-check.git@v2.2.0
When using a virtual environment, make sure that the iRODS system user has access to this environment.
When the installation was successful, the ichk command will be available. It extracts the credentials and iRODS settings from the iRODS environment file of the current user. This environment file can be (re-)created with the iinit command. This user should also have access to the files in the vault path directly.
The command line switches are displayed below:
usage: ichk [-h] [-f FQDN]
(-r RESOURCE | -v VAULT | -l DATA_OBJECT_LIST_FILE | --all-local-resources | --all-local-vaults)
[-o OUTPUT] [-m {human,csv}] [-t TRUNCATE] [-T TIMEOUT]
[-s ROOT_COLLECTION] [--no-verify-checksum] [-q]
Check consistency between iRODS data objects and files in vaults.
optional arguments:
-h, --help show this help message and exit
-f FQDN, --fqdn FQDN FQDN of resource
-r RESOURCE, --resource RESOURCE
iRODS path of resource
-v VAULT, --vault VAULT
Physical path of the resource vault
-l DATA_OBJECT_LIST_FILE, --data-object-list DATA_OBJECT_LIST_FILE
Check replicas of a list of data objects on this
server.
--all-local-resources
Scan all unixfilesystem resources on this server
--all-local-vaults Scan all vaults of unixfilesystem resources on this
server
-o OUTPUT, --output OUTPUT
Write output to file
-m {human,csv}, --format {human,csv}
Output format
-t TRUNCATE, --truncate TRUNCATE
Truncate the output to the width of the console
-T TIMEOUT, --timeout TIMEOUT
Sets the maximum amount of seconds to wait for server
responses, default 600. Increase this to account for
longer-running queries.
-s ROOT_COLLECTION, --root-collection ROOT_COLLECTION
Only check a particular collection and its
subcollections.
--no-verify-checksum Do not verify checksums of data objects. Just check
presence and size of vault files.
-q, --quasi-xml Enable the Quasi-XML parser, which supports unusual
characters (0x01-0x31, backticks)
You need to supply either a resource, a vault path, a data object list, the --all-local-resources option or the --all-local-vaults option.
The FQDN (fully qualified domain name) defaults to the FQDN of the current machine.
When composable resources are used, the ichk command will scan for leaf resources starting from the given resource.
The objects that are checked are categorized as follows:
- COLLECTION
- DATAOBJECT
- DIRECTORY
- FILE
These status codes can be used to represent the result of the check:
OK
NOT_EXISTING
: This object is found in the iRODS catalog, but is missing in the vault path.NOT_REGISTERED
: This object is found on the disk, but is missing from the iRODS catalog.FILE_SIZE_MISMATCH
: The object has another file size than registered in the iRODS catalog.CHECKSUM_MISMATCH
: This object does not have the same checksum as registered in the iRODS catalog.ACCESS_DENIED
: The current user has no access to this object in the vault path.NO_CHECKSUM
: There is no checksum registered in the iRODS catalog. This implies that file sizes do match.NO_LOCAL_REPLICA
: No replica of data object present on server (only used for object list check)NOT_FOUND
: Object name not found in iRODS (only used for object list check)REPLICA_NOT_GOOD
: Replica has a state other than good in the iCAT database (e.g. stale)UNKNOWN
: unable to verify (e.g. collections on a S3 resource)
The meaning of the fields in CSV output is:
- Object type
- Status code
- Replica status (as per the iCAT database)
- Logical path
- Vault path
- Observed checksum value (field is empty for collections / directories, as well as for files / data objects with a size mismatch)
- Expected checksum value (field is empty for collections / directories, as well as for files / data objects with a size mismatch)
- Observed file size (field is empty for collections / directories)
- Expected file size (field is empty for collections / directories)
- Resource name