🔍 Description This project is a Spam Detection System built using Python and Scikit-learn. It leverages a Random Forest Classifier to predict whether an incoming SMS message is spam or not based on word patterns. The system uses a TF-IDF vectorizer to convert text into numerical features and is trained on a dataset of labeled SMS messages.
🧠 Machine Learning : Uses Random Forest Classifier, a powerful ensemble learning method.
📊 Feature Extraction : Text data is transformed into numerical features using TF-IDF vectorization.
💬 Text Classification: Classifies messages as either Spam or Ham (Not Spam).
📈 Model Evaluation : Reports accuracy, precision.
📂 Dataset: The SMS Spam Collection dataset is used in this project. Downloaded from Kaggle(As per the given dataset).
🛴 SMS messages
🛴 Spam: Unwanted or harmful messages.
🛴 Ham: non-spam messages.
-
Data Preprocessing
Convert SMS text to lowercase.
Remove special characters.
Split data into training and testing sets.
-
Feature Extraction
-
Model Training
-
Model Evaluation
-
Prediction Function
-
User Corner - Test for random message
Metric Value :
Accuracy - 97.67%
Precision - 98%
Programming Language: Python
Libraries: Scikit-learn , Pandas , NumPy , TF-IDF Vectorizer , Random Forest Classifier
Include All The Specification specified other than Showing Accuracy Along with Gui Interface
Input: Users can type an SMS message into the input field.
Button: Once the "Predict" button is pressed, the system will classify the message.
Output: A pop-up message will show whether the message is classified as Spam or Not Spam.
This project is none licensed .
👨🏫 Club : AI CIRCLE
Kaggle's SMS Spam Dataset
Scikit-learn Documentation
Ansh Singh , 23bcs018@smvdu.ac.in
Shivam Kumar , 23bcs084@smvdu.ac.in
Vishal Kharwar , 23bcs100@smvdu.ac.in
Niraj Kumar , 23bcs057@smvdu.ac.in