Named Entity Recognition (NER) is a process of identifying and recognizing entities through text.
The goal of the project is to create a model that is able to find an entity from the raw data and can determine the category to which the element belongs. There are four categories: names of people, organizations, places and more. Identified by the labels PER, ORG, LOC, and O respectively.
conda create --name ner python=3.7.11
conda activate ner
pip install -r requirements.txt
$ cd dataset
$ python download_glove.py
python main.py --char-embedding-dim CHAR-EMB-DIM --char-len -- CHAR-LEN --hidden-dim HIDDEN-DIM --embedding-dim EMB-DIM --epochs EPOCHS --batch-size BATCH-SIZE --lr LEARNING-RATE --dropout DROPOUT --bidirectional BIDIRECTIONAL --num-layers NUM-LAYERS --only-test ONLY-TEST
where
CHAR-EMB-DIM
is the dimension of the char embedding, default is 10CHAR-LEN
is the maximum length of the char sequence, default is 8HIDDEN-DIM
is the dimension of the hidden layer, default is 256EMB-DIM
is the dimension of the word embedding, default is 300EPOCHS
is the number of epochs, default is 50BATCH-SIZE
is the batch size, default is 64LEARNING-RATE
is the learning rate, default is 0.001DROPOUT
is the dropout rate, default is 0.5BIDIRECTIONAL
is the bidirectional flag, default is TrueNUM-LAYERS
is the number of layers, default is 2ONLY-TEST
is the only test flag, default is False
for example, for training:
python main.py
for testing
python main.py --only-test True