Skip to content

openminted/omtd-rspub-elastic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

omtd-rspub-elastic

NOTE: this project has been integrated into the py-resourcesync library.

An extension for the rspub-core library supporting Elasticsearch storage.

Proposal and documentation available here.

Overview

This software is based on the rspub-core library, which allows ResourceSync document generation from resources stored on the file system. This approach can be challenging when dealing with a huge amount of resources, since it is necessary to scan the file system multiple times in order to detect changes and regenerate sitemaps overtime.

Therefore, we extended the rspub-core library in order to support data storage in Elasticsearch. The proposed approach is extensively described in the documentation. The protocol document describes the mappings used to store resources and changes into an Elasticsearch index. The description document provides on overview on the general approach and project goals.

Usage

The ElasticGenerator takes a configuration dictionary defined in the ElasticRsParameters class, which extends the set of parameters required by the rspub-core RsParameters class to properly configure and query an Elasticsearch instance for the ResourceSync framework. Here is an example of configuration file:

resource_set: capabilityname
resource_dir: tmp/dit 
metadata_dir: resourcesync/capabilityname
res_root_dir: tmp/dit
url_prefix: http://example.com/
max_items_in_list: 50000
zero_fill_filename: 4
is_saving_pretty_xml: True
is_saving_sitemaps: True
has_wellknown_at_root: True
description_dir: tmp/dit/resourcesync
elastic_host: localhost
elastic_port: 9200
elastic_index: test-resourcesync
elastic_resource_type: resource
elastic_change_type: change

TODO: provide explaination for each parameter

Three executors are provided:

  • generate_resourcelist: generates a resourcelist based on the documents stored at the specified elastic_resource_type
  • generate_new_changelist: generates a new changelist based on the documents stored at the specified elastic_change_type
  • generate_inc_changelist: updates a previously generated changelist

Each executor will generate ResourceSync-compliant documents for the capability list specified in the configuration.