Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with query numbers in cranqrel.trec.txt file #1

Open
paul-sheridan opened this issue May 15, 2023 · 4 comments
Open

Problem with query numbers in cranqrel.trec.txt file #1

paul-sheridan opened this issue May 15, 2023 · 4 comments

Comments

@paul-sheridan
Copy link

The query numbers in the cranqrel.trec.txt file (the first column) run from 1 to 225 by increments of 1. It seems to me, however, they should correspond to the contents of the field in the cran.qry.xlm file. If so, the first column of cranqrel.trec.txt should be consistent with the sequence 1, 2, 4, 8, 9, ..., 360, 365.

@paul-sheridan
Copy link
Author

It just dawned on me that I didn't explain the issue quite right. The problem is that the first column of the cranqrel.trec.txt file uses a mapping of the query ids $1, 2, 4, 8, 9, ..., 360, 365$ to $1, 2, 3, 4, 5, ..., 224, 225$ with $225$ being the total number of queries.

@Karoljv
Copy link

Karoljv commented Jul 18, 2024

It just dawned on me that I didn't explain the issue quite right. The problem is that the first column of the cranqrel.trec.txt file uses a mapping of the query ids 1,2,4,8,9,...,360,365 to 1,2,3,4,5,...,224,225 with 225 being the total number of queries.

And how it went? You mapped these values like that?

@paul-sheridan
Copy link
Author

It just dawned on me that I didn't explain the issue quite right. The problem is that the first column of the cranqrel.trec.txt file uses a mapping of the query ids 1,2,4,8,9,...,360,365 to 1,2,3,4,5,...,224,225 with 225 being the total number of queries.

And how it went? You mapped these values like that?

It actually worked out perfectly.

My code is found at this repo <https://github.com/paul-sheridan/hgt-tfidf>, if that helps. I used Python to preprocess the Cranfield collection data files. However, I didn't modify the query IDs there. The way I handled mapping the query IDs is found in lines 117 to 136 of the associated hgt-tfidf/cranfield/cranfield-experiments.Rmd file.

@Karoljv
Copy link

Karoljv commented Jul 20, 2024

It just dawned on me that I didn't explain the issue quite right. The problem is that the first column of the cranqrel.trec.txt file uses a mapping of the query ids 1,2,4,8,9,...,360,365 to 1,2,3,4,5,...,224,225 with 225 being the total number of queries.

And how it went? You mapped these values like that?

It actually worked out perfectly.

My code is found at this repo https://github.com/paul-sheridan/hgt-tfidf, if that helps. I used Python to preprocess the Cranfield collection data files. However, I didn't modify the query IDs there. The way I handled mapping the query IDs is found in lines 117 to 136 of the associated hgt-tfidf/cranfield/cranfield-experiments.Rmd file.

Thanks for answer and link to repo paul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants