GitHub - UmerTariq1/Urdu_MsMarco_Translation_Retrieval: Code for the paper "Enabling Low-Resource Language Retrieval: Establishing Baselines for Urdu MS MARCO"

This is the code repository for the paper: https://arxiv.org/abs/2412.12997 ( Enabling Low-Resource Language Retrieval: Establishing Baselines for Urdu MS MARCO )

The model and data are available at: https://huggingface.co/Mavkif

This repository has two folders : Translation and Retrieval. Each folder has its own README.md file which instructs how to recreate the data and model for the paper step by step.

There are two envioronment setups given because of different dependencies of the two folders. each step in the sub readme file or the bash script it refers to, mentions which environment to use. if there is no mention of environment, it means the main environment is to be used. otherwise scripts that use the pyserini environment explicitly mention it.

Dataset examples can be found on the huggingface dataset page.

Each of the dataset and model cards on huggingface website are explained in detail on their respective pages.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Retrieval		Retrieval
Translation		Translation
conda_freeze_mainenv.txt		conda_freeze_mainenv.txt
conda_freeze_pyserinienv.txt		conda_freeze_pyserinienv.txt
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Languages

UmerTariq1/Urdu_MsMarco_Translation_Retrieval

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Languages