Evaluation of Temporal Change in IR Test Collections

This repository contains the code to investigate the temporal change of retrieval effectiveness. Five state-of-the-art retrieval systems are revisited in different search scenarios with temporal changes in experimental components (such as documents, topics, and qrels), and it is investigated how they reproduce their effectiveness at a later point in time.

This repository holds the code to create and augment the datasets to simulate the different EE, create and run the retrieval systems in Docker containers to produce the runs, and analyze the runs to investigate the temporal reproducibility of retrieval effectiveness. In the following, the usage instructions are listed to reproduce the results.

Install dependencies

All dependencies are listed in the requirements.txt file. To install the dependencies, run the following command:

pip install -r requirements.txt

1. Prepare Test Collections

Three test collections are used in the experimental evaluation: TREC-COVID, TripClick, and LongEval. TREC-COVID and TripClick are acquired through IR Datasets and are expected to be placed in the data directory (the expected directory structure is listed later). The LongEval dataset is available through LINDAT (Train, Test). For further information, we refer to the LongEval website. The dataset should also be placed in the data directory.

1.1 Scrape metadata for TripClick

Since TripClick does not contain natural rounds, the evolving test collection is simulated by splitting the collection into three parts based on the publication date of the documents. The publication date can be obtained through the scrape_trip_click_metadata.ipynb notebook.

1.2 Create EEs

To prepare the EEs from the datasets some preprocessing steps are necessary. In the following notebook, the topics are limited to the topics that are used in all EEs and seperate qrels files are created for each EE. The notebook can be found here: prepare_datasets.ipynb

2. Create Runs

The runs are created in a two-step procedure. First, the index is created, and second, the runs are made. These steps can conveniently be executed through docker compose files that are available in the images directory

2.1 Create Indexes

The indexes are created through the two files make-indexe-pyterrier.yml and make-indexe-pyterrier-d2q.yml. The compose files will build the containers for the different EEs and execute the indexing. The data dir is mounted to the container to access the datasets and to store indexes. The compose files can be executed with the following command:

docker compose -f make-indexe-pyterrier.yml up && docker compose -f make-indexe-pyterrier-d2q.yml up

2.2 Query the Systems

After the indexes are created, the runs are made through the other compose files in the images directory. The compose files will build the containers for the different EEs and execute the retrieval. The data dir is mounted to the container to access the datasets and indexes and finally to store the runs.

docker compose -f make-runs-bm25.yml up && \
docker compose -f make-runs-bm25_colbert.yml up && \
docker compose -f make-runs-bm25_monot5.yml up && \
docker compose -f make-runs-bm25_bo1.yml up && \
docker compose -f make-runs-pl2.yml up && \
docker compose -f make-runs-xsqra_m.yml up && \
docker compose -f make-runs-rrf.yml up

3. Experimental Evaluation

After the runs are created, it can be investigated how well they reproduce the retrieval effectiveness at a later point in time.

3.1 Filter runs and qrels

In the experimental evaluation, the scenario of changing documents (D'TQ') is investigated. To achieve this, the datasets need to be limited to the set of queries that are used in all EEs. This can be achieved through the following notebook: prepare_runs.ipynb.

3.2 Analyze the runs

Finally, the code for the analysis of the runs can be found in the following notebooks:

Expected Data Structure

data
├── core_queries.tsv
├── dataset
│   └── TripClick
│       ├── ...
│       ...
│       
├── index
│   ├── index-longeval-LT-pyterrier
│   ...    
│   
├── longeval_topics_core_queries_unified.tsv
├── metadata.jsonl
├── metadata_processed.jsonl
│   
├── qrels
│   ├── longeval-LT.qrels-test
│   ...
│   
├── run-core_queries
│   ├── run-longeval-LT-test-bm25-pyterrier
│   ...
│   
├── trec-covid.core-queries.txt
├── trec_covid.json
└── tripclick-subcollections.json

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
evaluation		evaluation
images		images
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluation of Temporal Change in IR Test Collections

Install dependencies

1. Prepare Test Collections

1.1 Scrape metadata for TripClick

1.2 Create EEs

2. Create Runs

2.1 Create Indexes

2.2 Query the Systems

3. Experimental Evaluation

3.1 Filter runs and qrels

3.2 Analyze the runs

Expected Data Structure

About

Releases

Packages

Languages

irgroup/ICTIR24-temporal-change

Folders and files

Latest commit

History

Repository files navigation

Evaluation of Temporal Change in IR Test Collections

Install dependencies

1. Prepare Test Collections

1.1 Scrape metadata for TripClick

1.2 Create EEs

2. Create Runs

2.1 Create Indexes

2.2 Query the Systems

3. Experimental Evaluation

3.1 Filter runs and qrels

3.2 Analyze the runs

Expected Data Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages