This repository implements Token Classification models, a Natural Language Processing (NLP) task that assigns labels to individual tokens in a sentence. These models are built using TensorFlow and the Hugging Face Transformers library. The architectures are based on LSTM networks and the pretrained BERT model.
Key applications of token classification include Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. In real-world scenarios, these tasks are crucial for various applications such as information extraction, text analysis, and language understanding.
-
Named Entity Recognition: This model identifies and classifies named entities in a text, such as names of persons, dates, locations, organizations, etc. It has been trained using the NER dataset from Kaggle, which provides 17 different labels for this task.
-
Part-of-Speech Tagging: This model recognizes and tags parts of speech, such as nouns, pronouns, adjectives, or verbs, in a given text. It has been trained using a dataset containing 42 labels specifically for this task, also sourced from Kaggle.
- Named Entity Recognition
- Part-of-Speech Tagging