Skip to content

utshabkg/ner-pos-tagging

Repository files navigation

(GigaTech) end-to-end NER & POS Tagging Classification.

Author Contributions welcome Stars

Prerequisites

  • For testing the deployed project in the cloud, there is no prerequisite! Go to this link and test with any Bangla sentence you want.

    NOTE: Since it's a free instance of Render, it will spin down with inactivity, which can delay requests by 50 seconds or more. So, please wait for 1-2 minutes to initiate the web app. For now, the Bangla sentence may have a maximum of 25 tokens as the base model was trained with that size. A quick look at the web application:

    web_app

  • Now it's time to run and test the project in your system. For that:

    Conda: Ensure your system has Conda installed. If not, you can download and install it from here. You can use Virtualenv or Poetry or any other tools too, if you know how to set up with them.

Environment Setup

Follow these steps to set up your project environment:

  1. Clone the repository:

    git clone https://github.com/utshabkg/ner-pos-tagging/
    cd ner-pos-tagging
  2. Run the setup script:

    This script will create a new Conda environment named gigatech with Python 3.10 and install all the required packages.

    bash setup_env.sh
  3. Activate the Conda environment:

    After running the setup script, activate the new environment (if not activated):

    conda activate gigatech
  4. Verify the installation:

    Ensure that all packages are installed correctly by running:

    pip list

    This should display a list of installed packages.

Inference

You can Provide any Bangla Sentence and get the results. Available both in Terminal and a Web Application (powered by FastAPI)

Web Application

python main.py

Open your browser and go to http://localhost:8000/

For Terminal

cd components
python inference.py

Dockerize (Bonus)

docker build -t gigatech-app .    # build the image
docker run -d -p 8000:8000 gigatech-app    # run the container
docker ps    # check
docker stop <container-id-from-ps>    # stop

Open your browser and go to http://localhost:8000/

NOTE: Please wait some time to load all the components after running the container. You can see the docker log with:

docker logs <container-id-from-ps>

The container is ready to watch in the browser after this message appears in the log:

Application startup complete.

Endpoints to check with Curl or Postman

Predict: (Terminal should have Bangla Unicode Support to understand result)

curl -X POST "http://127.0.0.1:8000/predict_json" -H "Content-Type: application/x-www-form-urlencoded" -d "sentence=আমি বাংলা ভাষায় কথা বলি"

Health Check:

curl -X GET "http://127.0.0.1:8000/health"

NOTE: I have trained the model with max_token=25, so keep the total number of words and punctuation within that. You can increase the token size and train a larger model too.

ONNX Integration

cd components/utils
python convert_model_onnx.py    # convert model to onnx

A model (has been already) created at the path: notebooks/models_evaluation/models/base_model.onnx.

Inference

cd components
python inference_onnx.py

Training Model with Custom Dataset & Evaluate

For creating a base model of your own to be preprocessed and trained with your data, run:

cd components
python preprocessing.py
python model_training.py

A model will be created at the path: notebooks/models_evaluation/models/custom_data_model.h5.

Evaluate your new model with:

cd components
python model_evaluation.py

You will get your results in: reports/final_score_custom.txt file.

NOTE: Data format should be the same as the dataset folder. It should be a .tsv file. Rename your dataset file to data.tsv, keep it inside the dataset folder and you're all set!

Data Exploration

If you are interested in Exploratory Data Analysis, Preprocessing, and other experiments (e.g. Hyperparameter Tuning) which I enjoyed, you can watch the notebooks folder.

Documentation of Work Details

A document explaining the code and decisions made during the development process. Click here.

Performance Metrics

A report of the model's performance on the test set, including accuracy, precision, recall, and F1 score. Click here.

Base Model

A Plotting of training and validation accuracy and loss plot during a base model training.

accuracy_plot loss_plot

Hyperparameter tuning was executed too. If you want, you can explore the models/parameters_track folder to see the outcomes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published