spamBERT - BERT pre-trained model for spam classification

spamBERT is a simple user-friendly webpage for spam classification of email and sms texts, using a fine-tuned BERT base model (cased). Leveraging BERT's contextual understanding of language, the pre-trained model has been fine-tuned using two combined datasets, specific for this purpose, to be used to classify the text given in input by the user.

Directory tree

data: contains the datasets used for the fine-tuning;
model: contains the BERT classifier class and the methods to train, evaluate and test the model;
static: contains che CSS file and the logo file used to develop the frontend, using Flask framework, made up in order to use the model with a graphic interface;
templates: contains the HMTL file used by Flask;
utility: contains the custom dataset class and some utility methods, like the loading data function.

Outside the directories there are three files:

app.py: runs the Flask app;
exec_app.py: allows the user to run the app like an internal program on the machine;
inference.py: allows the user to test the model inside the IDE.

The dataset

The datasets, SMS Spam Collection and Spam-Ham Dataset are two collection of spam and not-spam SMSs and e-mails. Every sample has two features: the target feature ("ham" or "spam") and the main feature that contains the text.

spamBERT architecture

Our architecture includes the pre-trained BERT base cased (with 12 encoders) and a fully connected layer, fine-tuned in order to perform well on our task. We obtained an accuracy score of 99%.

Results

The interface of our project is the following:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spamBERT - BERT pre-trained model for spam classification

Directory tree

The dataset

spamBERT architecture

Results

Authors

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
model		model
static		static
templates		templates
utility		utility
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
inference.py		inference.py
main.py		main.py

License

davide-abbattista/spamBERT

Folders and files

Latest commit

History

Repository files navigation

spamBERT - BERT pre-trained model for spam classification

Directory tree

The dataset

spamBERT architecture

Results

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages