This project is about Fake News Detection. I use the data and materials from this competition site: http://www.fakenewschallenge.org/
The purpose of the project is to try to train and predict the category of a piece of new based on the compatibility between its headline and its body. This is called as Stance Detection.
- Input: A headline and a body text
- Output: label/decision of whether the body agrees/ disargees/ discusses/ unrelated with the topic.
- File train_stances.csv has the information of stances for train data
- File train_bodies.csv maps bodyID with body content
- File test_stances.csv and test_bodies.csv has the same information for the test data.
I did some data cleaning, feature extraction using tf-idf and experiment with some common classification algorithms.