This project is designed to detect fake news using Python and supervised learning techniques. The aim is to classify news articles as either "Real" or "Fake" using machine learning models. The dataset used includes features like article titles and text, with labels indicating the authenticity of the news.
- Text classification using Naive Bayes and Passive Aggressive Classifier.
- Feature extraction with CountVectorizer and TfidfVectorizer.
- Evaluation of model performance with accuracy scores and confusion matrices.
The dataset used for this project is from Kaggle, containing news articles with the following attributes:
- Title: The title of the news article.
- Text: The full text of the article.
- Label: Whether the news is 'Real' or 'Fake'.
- Clone the repository:
git clone https://github.com/ayshikakap31/Fake-News-Detection-Python.git
- Navigate to the project directory:
cd Fake-News-Detection-Python
- Install the required dependencies:
pip install -r requirements.txt
-
Load and preprocess the data:
- Import the dataset (
fake_or_real_news.csv
) and split it into training and testing sets.
- Import the dataset (
-
Run the classifiers:
- Use Naive Bayes and Passive Aggressive Classifier to classify news articles.
-
Evaluate the models:
- Generate accuracy scores and confusion matrices to assess the models' performance.
- Naive Bayes Classifier: Best suited for text classification, leveraging the frequency of words.
- Passive Aggressive Classifier: Ideal for large datasets, this model updates the decision boundary as new data comes in.
- Naive Bayes Accuracy: ~85%
- Passive Aggressive Accuracy: ~88%
- Confusion matrices visualize the performance and types of errors made by the classifiers.