Skip to content

Code for evaluating techniques for abbreviation detection and expansion in context

License

Notifications You must be signed in to change notification settings

angel-daza/abbreviation-detector

Repository files navigation

abbreviation-detector

Code to train classifiers for abbreviation detection and expansion in context. This repository also contains the evaluation code that complements the paper Dealing with Abbreviations in the Slovenian Biographical Lexicon to be presented at The 2022 Conference on Empirical Methods in Natural Language Processing EMNLP 2022

Installation

Download repo

git clone git@github.com:angel-daza/abbreviation-detector.git

Create a new environment:

conda create -n abbr-detector python=3.9
conda activate abbr-detector

Install Requirements:

pip install -r requirements

Paper Results

Abbreviation Detection

Create the Dataset Train/Dev/Test Partitions:

python3 slovene_abbr_preprocess.py

To Reproduce the Baseline Results:

python3 naive_baselines.py

To Reproduce the BERT Abbreviation Classifier Results:

# 1) Train the Binary BERT Classifier [ABBR, NO_ABBR]
python3 bert_token_classifier.py -t data/sbl-51abbr.tok.train.json -d data/sbl-51abbr.tok.dev.json\
     --bert_model 'EMBEDDIA/sloberta' --save_model_dir saved_models/BERT_ABBR_876972\
     --epochs 5 --batch_size 32 --info_every 10 --seed_val 876972

# 2) Make predictions using the BERT Classifier
python3 bert_token_classifier_predict.py -m saved_models/BERT_ABBR_876972 --bert_model 'EMBEDDIA/sloberta'\
     --epoch 1 --test_path data/sbl-51abbr.tok.test.json --gold_labels True

Abbreviation Expansion

To Reproduce BERT Abbreviation Expansion Results:

python3 bert_abbrev_expansion.py

About

Code for evaluating techniques for abbreviation detection and expansion in context

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages