This project focuses on classifying emails as spam or not spam using Natural Language Processing (NLP) techniques. The repository includes code for training models, evaluating their performance, and a dataset for experimentation.
models/
: Directory containing trained models.src/
: Directory containing source filesFFNN.py
: Script for training a Feedforward Neural Network.main.py
: Main script for running the classification.utils.py
: Utility functions for data processing and model evaluation.
data/
: Directory containing the data setOppositional_thinking_analysis_dataset.json
: Dataset used for training and evaluation.
desc/
: Directory containing the description of the projectNLP - Project 1_4.pdf
: Project documentation and analysis.
- Python 3.x
- Necessary libraries listed in
requirements.txt
- Clone the repository:
git clone https://github.com/damlakayikci/Spam-Email-Classification-NLP.git cd src
Use one of the following scripts to run the code
-
To train and run the Naive Bayes model:
python main.py nb
-
To train and run the Feedforward Neural Network (FFNN):
python main.py ffnn
-
To print statistics and plot graphs:
python main.py stats
-
To find the Pointwise Mutual Information (PMI) of 10 random words and print the most similar words:
python main.py pmi