This repository provides underlying code and materials for the paper When Time Makes Sense: A Historically-Aware Approach to Targeted Sense Disambiguation
.
We strongly recommend installation via Anaconda:
-
Create a new environment:
conda create -n py37_tsd python=3.7
- Activate the environment:
conda activate py37_tsd
- Install dependencies:
cd /path/to/my/TargetedSenseDisambiguation
pip install -r requirements.txt
Also, we use a spaCy model: en_core_web_lg which can be installed:
python -m spacy download en_core_web_lg
This section explains how to run the code. For most of scripts you'd need credentials for the Oxford Historical Dictionary Research API. These scripts are marked by \*\*
. More information on obtaining access to the API can be found here.
[WARNING] Results produced by this notebook may slightly differ from those in the paper, this is because:
- the source data (the quotations stored in the OED) may change over time
- the order is which data is retrieved and stored changes with each run, reulting in the different splits for train, validation and test. Please contact the author
However, the authors have rerun the pipeline multiple times and scores produced by these scritps are close to the ones reported in the paper and don't affect the conclusions drawn from the experiments.
The only deviation may be results for the curated experiments, which tend to be more volatile.
This script generate_dataframes.py
downloads data from the API for a given headword and vectorizes the keyword of the quotations.
[WARNING] This script requires access to the historical BERT models, available on Zenodo. Please copy bert_1760_1850
and bert_1760_1900
models to the models
folder and adjust the paths in lines 7-8.
[WARNING] To download the data you need access to the OED API, more information on how to obtain credentials is available here. Once you have the credentials, add them to oed_credentials.json
.
python generate_dataframes.py
All results should be saved in the /data
folder. Almost all next steps require these data as input.
The code snippet below runs the main experiment that tests the effect of plugging in historical BERT models.
[WARNING] This script requires access to a historical word2vec model which available on Zenodo. Please copy the w2v_1760_1900
model to the models
folder.
[WARNING] in line 15 of run_main_experiment.py
change the path to the word2vec model.
python run_main_experiment.py
All results should be saved in result_{year}
folder.
[WARNING] in line 15 of run_experiment_ts_disambiguation.py
change the path to the word2vec model.
To create results files for the time-sensitive methods, run:
python python run_experiment_ts_disambiguation.py
Then run run_experiment_ts_disambiguation.py
to run the experiments with time-sensitive disambiguation.
python run_experiment_ts_disambiguation.py
[WARNING] in line 15 of run_experiment_curated_cases.py
change the path to the word2vec model.
To run the case studies, execute:
python run_experiment_curated_cases.py
To create the results from the output generated by the experiments, run the cells in create_results_tables.ipynb
. This notebooks is runnable using the .csv
files with results from running the previous scripts.
To explore results and recreate Figure 1, run cells in explore_results.ipynb
. This notebooks requires output from generate_dataframes.py
(saved in the ./data
folder).