USAGE:
standard-benchmark [FLAGS] [OPTIONS] --config-file <config-file>
FLAGS:
-h, --help Prints help information
--print-stages Prints all available stages
-V, --version Prints version information
OPTIONS:
--collections <collections>... Filter out collections you want to run
--config-file <config-file> Configuration file path
--suppress <suppress>... A list of stages to suppress
The main settings are read from the configuration file.
Additionally, we can suppress certain stages with --suppress
option.
Run with --print-stages
to see all available stages.
In order to run only a subset of collections, use --collections
option.
The program takes a path to configuration file in YAML format. This section describes all recognized top-level sections.
This is a working directory. Paths in other settings will be resolved from it, unless they are absolute.
workdir: /path/to/workdir
This item defines where the PISA programs will come from.
Simply defines a bin
directory where the executables should reside.
source:
type: path
path: /usr/bin
Defines Git repository and branch. This source will cause for the code
to be cloned to workdir
and compiled.
source:
type: git
path: https://github.com/pisa-engine/pisa.git
Unimplemented
source:
type: docker
tag: latest
This is a list of all collections to be tested. Each collection has:
name
-- for identification and cross-referrencing from runskind
-- a type of collection, e.g.,trecweb
orwarc
collection_dir
-- where the collection is storedforward_index
-- the basename of the forward index (optional; default=workdir
/fwd/name
)inverted_index
-- the basename of the inverted index (optional; default=workdir
/inv/name
)encodings
-- a list of encodings to compress the index to
collections:
- name: wapo
description: WashingtonPost.v2
collection_dir: /data/collections/WashingtonPost.v2
forward_index: fwd/wapo
inverted_index: inv/wapo
encodings:
- block_simdbp
- block_qmx
Runs are experiments to run on the collections, once they are indexed.
Note: At this point, only evaluating queries is supported.
runs:
- collection: wapo
type: evaluate
topics: /data/collections/WashingtonPost.v2/topics.core18.txt
topics_format: trec
trec_topic_field
qrels: /data/collections/WashingtonPost.v2/qrels.core18.txt
runs:
- collection: wapo
type: evaluate
topics: /data/collections/WashingtonPost.v2/topics.core18.txt
topics_format: simple
qrels: /data/collections/WashingtonPost.v2/qrels.core18.txt
simple
format is one query per line with ID before a colon:
1:first query
2:second query