This repository contains 3HAN: A Hierarchical Attention Network for Fake News Detection. Official Paper link: https://link.springer.com/chapter/10.1007/978-3-319-70096-0_59
![image](https://private-user-images.githubusercontent.com/174422106/352434193-3a2a6e62-dbde-4b7d-a110-5c1af3ca1a8c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NDgwNTcsIm5iZiI6MTczOTY0Nzc1NywicGF0aCI6Ii8xNzQ0MjIxMDYvMzUyNDM0MTkzLTNhMmE2ZTYyLWRiZGUtNGI3ZC1hMTEwLTVjMWFmM2NhMWE4Yy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE1JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNVQxOTI5MTdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hZmRlMmM3OGQ3Y2MxM2I3ZGI0OTEyNzQ3NjYyNjM4ZTNjMzI4MTY0ZDczNjMzMmM4NmMzNTMxOGUxYjRhN2FlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.0jf2sJnL6eQFmaF6Vv0zeUudRBgFsCLDLNh0gAuO5r4)
3HAN utilizes three attention levels, focusing on words, sentences, and the headline, to construct a news vector: an effective representation of an input news article. It processes the article in a hierarchical bottom-up manner. Since the headline is a distinguishing feature of fake news and only a few words and sentences in an article carry more significance than others, 3HAN assigns differential importance to various parts of the article through its three layers of attention. Basically: Word Level Attention -> Sentence Level Attention -> Title Level Attention -> Final News Vector. BidirectionalGRU is used for sequence encoding.
I used the following datasets from Kaggle: https://www.kaggle.com/datasets/emineyetm/fake-news-detection-datasets
If you want to change the parameters, check out Train.py-> Class Options where you can also add the path to your data.
To run the code, simply go to the terminal and type:
python Train.py
![image](https://private-user-images.githubusercontent.com/174422106/352437416-fdac8256-b8c9-4e33-8859-c6c02809503c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NDgwNTcsIm5iZiI6MTczOTY0Nzc1NywicGF0aCI6Ii8xNzQ0MjIxMDYvMzUyNDM3NDE2LWZkYWM4MjU2LWI4YzktNGUzMy04ODU5LWM2YzAyODA5NTAzYy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE1JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNVQxOTI5MTdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT05ZTMyZGQ1YWJjMzUxZDg5OThlYWRkM2Q3N2U5N2JhYTg3YjBhMDY2MDM1MmE1MjU4NzViNDM1MzE5ZmU2NDdhJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.nv25S9nNgjLRGvunLpEjbs_VmtF4-FvNRgBW0jhab0I)
Breakdown of the Confusion Matrix ---->
True Positives (TP): 115,682 (bottom right)
True Negatives (TN): 114,152 (top left)
False Positives (FP): 5,348 (top right)
False Negatives (FN): 4,818 (bottom left)
Accuracy: The overall accuracy of the model is (TP + TN) / Total = (115,682 + 114,152) / (115,682 + 114,152 + 5,348 + 4,818) ≈ 0.957. SO about 95.7% of the predictions were correct.
Precision (True Class): TP / (TP + FP) = 115,682 / (115,682 + 5,348) ≈ 0.956
Recall (True Class):TP / (TP + FN) = 115,682 / (115,682 + 4,818) ≈ 0.960
There is a good balance between false positives and false negatives, so the model does not heavily favor one class over the other.