High-performance multi-threaded list-wise learning-to-rank implementation that supports mini-batched learning. The implementation focuses on coordinate ascent to optimize for mean-average precision, which is one of the strongest models for model combination in information retrieval.
Rank-lips is designed to work with trec_eval file formats for defining runs (run format) and relevance data (qrel format). The features will be taken from the score and/or reciprocal rank of each input file. The filename of an input run (in the directory) will be used as a feature name. If you want to train a model and predict on a different test set, make sure that the input runs for test features are using exactly the sane filename. We recommend to create different directories for training and test sets.
Rank-lips is implemented in GHC Haskell, but we also provide Linux binaries. Rank-lips is released AS-IS under the BSD-3-Clause open source license.
Authors: Laura Dietz and Ben Gamari.
Official Rank-Lips website with more examples and usage instructions: http://www.cs.unh.edu/~dietz/rank-lips/
-
Build from source:
-
Install GHC 8.8.3 and cabal
-
cabal new-build rank-lips
-
Calling
./mk-cabal-bin.sh
will create soft links to the binary in./bin
-
-
Download statically linked binary from rank-lips wesbsite
Training with 5-fold cross-validation
The features are given as a set of run-files (one run = one feature). The filename of the runfile in the $TRAIN_FEATURE_DIR
is used as a feature name. When the trained model is used to predict on a different dataset, the directory of test features need to use exactly the same file names. (Hint: you can use a directory of softlinks to define the names.) The ground truth is given as a qrels format.
rank-lips train -d "$TRAIN_FEATURE_DIR" -q "$QREL" -e "$FRIENDLY_NAME" -O "$OUT_DIR" -o "$OUT_PREFIX" -z-score --feature-variant FeatScore --mini-batch-size 1000 --convergence-threshold -0.001 --train-cv
The $OUT_DIR$
will contain the following files:
Cross-validation:
train-run-test.run
prediction with cross-validation (no leakage of test data into the training process! Use this in your research paper!)train-model-fold-$k-best.json
the model trained for fold$k
(best model of 5 restarts)train-run-fold-$k-best.run
run predicted of fold's model on the folds' training data (used to compute the training MAP)
Trained on whole data set:
train-model-train.json
model trained on all data (to be used withrank-lips predict
)train-run-train.run
run predicted with the overall train model on the train data (used to produce the training MAP score)
Directory ./rank-lips-example
for an educational example on how to use rank-lips.