Skip to content

Commit

Permalink
Merge branch 'master' of github.com:osirrc/terrier-docker
Browse files Browse the repository at this point in the history
  • Loading branch information
ArthurCamara committed Jun 19, 2019
2 parents 3a8aefe + de0001e commit 9fc9f99
Showing 1 changed file with 59 additions and 13 deletions.
72 changes: 59 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,15 @@ This image is available on [Docker Hub](https://hub.docker.com/r/osirrc2019/terr
The following `jig` command can be used to index TREC disks 4/5 for `robust04`:

```
python run.py prepare --repo terrier --collections robust04=/tmp/disk45/=trectext
python run.py prepare --repo osirrc2019/terrier --tag v0.1.5 --collections robust04=/tmp/disk45/=trectext
```

The following `jig` command can be used to perform a retrieval run on the collection with the `robust04` test collection, using BM25 as ranker:

```
python run.py search \
--repo osirrc2019/terrier\
--repo osirrc2019/terrier \
--tag v0.1.5 \
--collection robust04 \
--topic topics/topics.robust04.txt \
--qrels qrels/qrels.robust04.txt\
Expand All @@ -34,27 +35,60 @@ python run.py search \

## Retrieval Methods:

This image supports the following weighting models: BM25 (`bm25`), PL2 (`pl2`) and `DPH` (`dph`).

Additionally, it supports Query Expansion and Proximity-based (DFRD) search, by including `qe`, `prox` or `prox_qe` to the `--opts config` argument: `--opts config=<retrieval_model>_<extra>`:

(BM25)

python run.py search --repo terrier --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs
python run.py search --repo osirrc2019/terrier --tag v0.1.6 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=bm25

(BM25 + query expansion)

python run.py search --repo terrier --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=bm25_qe
python run.py search --repo osirrc2019/terrier --tag v0.1.6 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=bm25_qe

(BM25 + Proximity)

python run.py search --repo osirrc2019/terrier --tag v0.1.6 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=bm25_prox

(BM25 + Proximity + query expansion)

python run.py search --repo osirrc2019/terrier --tag v0.1.6 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=bm25_prox_qe

(PL2)

python run.py search --repo terrier --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=pl2
python run.py search --repo osirrc2019/terrier --tag v0.1.6 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=pl2

(PL2 + query expansion)

python run.py search --repo terrier --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=pl2_qe
python run.py search --repo osirrc2019/terrier --tag v0.1.6 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=pl2_qe

(PL2 + Proximity)

python run.py search --repo osirrc2019/terrier --tag v0.1.6 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=pl2_prox

(PL2 + Proximity + query expansion)

python run.py search --repo osirrc2019/terrier --tag v0.1.6 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=pl2_prox_qe


(DFRD)
(DPH)

python run.py search --repo terrier --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=DFRD
python run.py search --repo osirrc2019/terrier --tag v0.1.6 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=dph

(DPH + query expansion)

**NOTE:** for running DFRD, the index must be build using the `--opts=block.index=true` param
python run.py search --repo osirrc2019/terrier --tag v0.1.6 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=dph_qe

(DPH + Proximity)

python run.py search --repo osirrc2019/terrier --tag v0.1.6 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=dph_prox

(DPH + Proximity + query expansion)

python run.py search --repo osirrc2019/terrier --tag v0.1.6 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=dph_prox_qe

**NOTE:** for running DFRD (Proximity-based model), the index must be build using the `--opts=block.index=true` param


## Learning to Rank Runs
Expand All @@ -63,19 +97,19 @@ Learning-to-rank will typically require that the index has more information, e.g

### Indexing:

python run.py prepare --repo terrier --collections robust04=/tmp/disk45/=trectext --opts "FieldTags.process=HEADLINE"
python run.py prepare --repo osirrc2019/terrier --tag v0.1.5 --collections robust04=/tmp/disk45/=trectext --opts "FieldTags.process=HEADLINE"

### Training:

You need to specify the features to be used by Terrier - see http://terrier.org/docs/v5.1/learning.html for more information about Terrier feature definitions.

python run.py train --repo terrier --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --test_split $PWD/sample_training_validation_query_ids/robust04_test.txt --validation_split $PWD/sample_training_validation_query_ids/robust04_validation.txt --model_folder /tmp/runs --opts features="SAMPLE;WMODEL:SingleFieldModel(BM25,0);QI:SingleFieldModel(Dl,0)"
python run.py train --repo osirrc2019/terrier --tag v0.1.5 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --test_split $PWD/sample_training_validation_query_ids/robust04_test.txt --validation_split $PWD/sample_training_validation_query_ids/robust04_validation.txt --model_folder /tmp/runs --opts features="SAMPLE;WMODEL:SingleFieldModel(BM25,0);QI:SingleFieldModel(Dl,0)"

### Retrieval:

You will need to specify the `bm25_ltr_jforest` configuration.

python run.py search --repo terrier --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=bm25_ltr_jforest
python run.py search --repo osirrc2019/terrier --tag v0.1.5 --collection robust04 --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt --output /tmp/runs --opts config=bm25_ltr_jforest

## Expected Results

Expand All @@ -89,4 +123,16 @@ MAP | BM25 | +QE | +Prox | +P

MAP | BM25 | +QE | +Prox | +Prox + QE| DPH | + QE | +Prox | +Prox +QE | PL2 | +QE
:---------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
[TREC 2018 Common Core Track Topics](https://trec.nist.gov/data/core/topics2018.txt)| 0.2326 | 0.2975 |0.2369 | 0.2960| 0.2427|0.3055|0.2428|0.3035 | 0.2225 | 0.2728
[TREC 2018 Common Core Track Topics](https://trec.nist.gov/data/core/topics2018.txt) |0.2326|0.2975|0.2369|0.2960|0.2427|0.3055|0.2428|0.3035 |0.2225| 0.2728

### GOV2

MAP | BM25 | +QE | +Prox | +Prox + QE| DPH | + QE | +Prox | +Prox +QE | PL2 | +QE
:---------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
[TREC 2004 Terabyte Track: Topics 701-750](http://trec.nist.gov/data/terabyte04.html) |0.2461|0.2621|0.2537|0.2715|0.2804|0.3120|0.2834|0.3064|0.2334|0.2478
[TREC 2005 Terabyte Track: Topics 751-800](http://trec.nist.gov/data/terabyte05.html) |0.3081|0.3506|0.3126| 0.3507|0.3311|0.3754|0.3255|0.3095|0.2884|0.3160
[TREC 2006 Terabyte Track: Topics 801-850](http://trec.nist.gov/data/terabyte06.html) |0.2629|0.3118|0.2724|0.3085|0.2917|0.3494|0.2904|0.3288|0.2363|0.2739


## Reviews
+ Documentation reviewed at commit [`0b13139`](https://github.com/osirrc/terrier-docker/commit/c09730fab75e9c4ff892cb9dc5d6b7a500b13139) (2019-06-16) by [Ryan Clancy](https://github.com/r-clancy/).

0 comments on commit 9fc9f99

Please sign in to comment.