🚀 Electra-VNJob-NER

Electra model for Named Entity Recognition (NER) with job recruitment information in Vietnam.
Welcome to watch ⭐, star ⭐, or fork 🍴.

📑 Table of Contents

📌 Introduction
📂 VNJob Dataset
- 📊 Data Formats
- 📝 Example Data
⚙️ Requirements
📈 Results
💻 Usage
🤝 Contributing
📜 License
🔗 References

📌 Introduction

This repository contains an implementation of the Electra model for Named Entity Recognition (NER), tailored for processing job recruitment data in Vietnam.

NER plays a crucial role in automated job-matching systems by identifying and categorizing entities such as job titles, skills, locations, and salary ranges from job postings. This enhances search relevance and recommendation systems in recruitment platforms.

We chose Electra, a transformer-based model, for its efficiency in pretraining and fine-tuning tasks. This allows it to capture domain-specific linguistic patterns in Vietnamese job postings, leading to improved entity recognition performance.

📂 VNJob Dataset

📊 Data Formats

The VNJob dataset consists of:

Training set: data/vnjob_train.csv
Validation set: data/vnjob_val.csv

There are 44,273 training samples and 11,086 validation samples.
🚨 No separate test set is provided. If needed, you can split the validation set (e.g., 80% for validation, 20% for testing).

The dataset contains 9 types of named entities:

🏷️ Job title (job_title)
🏢 Job type (job_type)
🏆 Position (position)
🌍 City (city)
🎓 Experience (experience)
🛠️ Skills (skills)
📌 Job fields (job_fields)
💰 Salary (salary)
❓ Other (O)

📝 Example Data

Character	Tag
Software	B-job_title
Engineer	I-job_title
at	O
Hanoi	B-city

⚙️ Requirements

This project was developed using Python with PyTorch.

📦 Install the dependencies before running the model:

pip install -r requirements.txt

📦 Dependencies:

torch==2.5.1
numpy==1.26.4
matplotlib==3.7.2
pathlib==1.0.1
transformers==4.47.0
datasets==3.2.0
tqdm==4.66.5
torchmetrics==1.6.0
pandas==2.0.3

📈 Results

The model's performance on the VNJob validation set:

Dataset	📊 Accuracy	🔎 Recall	🎯 Precision	🏆 F1 Score
Training set	99.99	99.95	99.94	99.94
Validation set	99.51	98.48	97.99	98.24

💻 Usage

🔧 Installation

📥 Clone the Repository

git clone https://github.com/tinh2044/Electra-VNJob-NER.git
cd Electra-VNJob-NER

🛠️ Create Virtual Environment (Recommended)

conda create --name ElectraNER python=3.9
conda activate ElectraNER

📦 Install Dependencies

pip install -r requirements.txt

📥 Download Dataset

Download the dataset from Google Drive.

Ensure the data/ folder has the following structure:

|——data
    |——vnjob_train.csv
    |——vnjob_val.csv

📚 Training Electra Model for NER Task

Run the following command to train the model:

python -m main --task train --epoch 200 --lr 0.001 --batch_size 32 --repo_id tinh2312/Electra-VNJob-NER

📊 Evaluate Electra Model

Run the following command to evaluate the trained model:

python -m main --task eval --batch_size 32 --repo_id tinh2312/Electra-VNJob-NER

🎛️ Interface Model with Gradio

Run the following command to launch the Gradio demo:

python app.py

or

gradio run app.py

🤝 Contributing

🚀 Contributions are welcome!
To contribute:

Fork this repository.

Create a new branch:

git checkout -b feature/your-feature-name

Make your changes and commit:

git commit -m "feat: add new preprocessing step"

Push to your fork and submit a pull request.

For major changes, please open an issue first to discuss your proposal.

📜 License

This project is licensed under the MIT License.

🔗 References

phongtranWW/crawler [GitHub]
Electra: Pre-training text encoders as discriminators rather than generators [Paper]
google-research/electra [GitHub]
huggingface/electra_model [GitHub]
chakki-works/seqeval [GitHub]

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
imgs		imgs
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
datasets.py		datasets.py
main.py		main.py
outputs.log		outputs.log
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Electra-VNJob-NER

📑 Table of Contents

📌 Introduction

📂 VNJob Dataset

📊 Data Formats

📝 Example Data

⚙️ Requirements

📦 Dependencies:

📈 Results

💻 Usage

🔧 Installation

📥 Clone the Repository

🛠️ Create Virtual Environment (Recommended)

📦 Install Dependencies

📥 Download Dataset

📚 Training Electra Model for NER Task

📊 Evaluate Electra Model

🎛️ Interface Model with Gradio

🤝 Contributing

📜 License

🔗 References

About

Releases

Packages

Languages

tinh2044/Electra-VNJob-NER

Folders and files

Latest commit

History

Repository files navigation

🚀 Electra-VNJob-NER

📑 Table of Contents

📌 Introduction

📂 VNJob Dataset

📊 Data Formats

📝 Example Data

⚙️ Requirements

📦 Dependencies:

📈 Results

💻 Usage

🔧 Installation

📥 Clone the Repository

🛠️ Create Virtual Environment (Recommended)

📦 Install Dependencies

📥 Download Dataset

📚 Training Electra Model for NER Task

📊 Evaluate Electra Model

🎛️ Interface Model with Gradio

🤝 Contributing

📜 License

🔗 References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages