abbreviation-detector

Code to train classifiers for abbreviation detection and expansion in context. This repository also contains the evaluation code that complements the paper Dealing with Abbreviations in the Slovenian Biographical Lexicon to be presented at The 2022 Conference on Empirical Methods in Natural Language Processing EMNLP 2022

Installation

Download repo

git clone git@github.com:angel-daza/abbreviation-detector.git

Create a new environment:

conda create -n abbr-detector python=3.9
conda activate abbr-detector

Install Requirements:

pip install -r requirements

Paper Results

Abbreviation Detection

Create the Dataset Train/Dev/Test Partitions:

python3 slovene_abbr_preprocess.py

To Reproduce the Baseline Results:

python3 naive_baselines.py

To Reproduce the BERT Abbreviation Classifier Results:

# 1) Train the Binary BERT Classifier [ABBR, NO_ABBR]
python3 bert_token_classifier.py -t data/sbl-51abbr.tok.train.json -d data/sbl-51abbr.tok.dev.json\
     --bert_model 'EMBEDDIA/sloberta' --save_model_dir saved_models/BERT_ABBR_876972\
     --epochs 5 --batch_size 32 --info_every 10 --seed_val 876972

# 2) Make predictions using the BERT Classifier
python3 bert_token_classifier_predict.py -m saved_models/BERT_ABBR_876972 --bert_model 'EMBEDDIA/sloberta'\
     --epoch 1 --test_path data/sbl-51abbr.tok.test.json --gold_labels True

Abbreviation Expansion

To Reproduce BERT Abbreviation Expansion Results:

python3 bert_abbrev_expansion.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
resources		resources
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
abbr_preprocess_german.py		abbr_preprocess_german.py
abbr_preprocess_slovenian.py		abbr_preprocess_slovenian.py
bert_abbrev_expansion.py		bert_abbrev_expansion.py
bert_token_classifier.py		bert_token_classifier.py
bert_token_classifier_predict.py		bert_token_classifier_predict.py
evaluate_abbrev_tokenization.py		evaluate_abbrev_tokenization.py
naive_baselines.py		naive_baselines.py
requirements.txt		requirements.txt
text_to_abbr_candidates.py		text_to_abbr_candidates.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

abbreviation-detector

Installation

Paper Results

Abbreviation Detection

Abbreviation Expansion

About

Releases

Packages

Languages

License

angel-daza/abbreviation-detector

Folders and files

Latest commit

History

Repository files navigation

abbreviation-detector

Installation

Paper Results

Abbreviation Detection

Abbreviation Expansion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages