Skip to content

Code for the paper "Do Sentence Interactions Matter ? Leveraging Sentence Level Representations for Fake News Classification"

Notifications You must be signed in to change notification settings

gymbeijing/fake_news_semantics

 
 

Repository files navigation

Using Semantics to Understand Fake News

Code for the EMNLP 2019 workshop (TextGraphs) paper "Do Sentence Interactions Matter ? Leveraging Sentence Level Representations for Fake News Classification"

Make sure the following files are present as per the directory structure before running the code,

fake_news_semantics
│   README.md
│   *.py
│   
└───data
    │   balancedtest.csv
    │   fulltrain.csv
    |   test.xsls

balancedtest.csv and fulltrain.csv can be obtained from https://drive.google.com/file/d/1njY42YQD5Mzsx2MKkI_DdtCk5OUKgaqq/view?usp=sharing

test.xsls is basically the SLN dataset according to the paper. You can obtain this dataset from http://victoriarubin.fims.uwo.ca/news-verification/data-to-go/ Contact me if you have trouble finding these datasets.

Dependencies,

pytorch 1.0.0
pandas
tqdm
xlrd (pip install xlrd)
bert-pytorch (pip install pytorch-pretrained-bert)

To train a LSTM model, run the following command,

python main.py --batch_size 1024 --config lstm --encoder 0 --mode 0

To train a CNN model, run the following command,

python main.py --batch_size 1024 --config cnn --encoder 1 --mode 0

To train a BERT model, run the following command,

python bert_classifier.py --batch_size 4 --max_epochs 10 --max_seq_length 500 --max_sent_length 70 --mode 0

To train a GCN based model, run the following command,

python main.py --batch_size 32 --max_epochs 10 --config gcn --max_sent_len 50 --encoder 2 --mode 0

To train a GCN based model with attention, run the following command,

python main.py --batch_size 32 --max_epochs 10 --config gcn_attn --max_sent_len 50 --encoder 3 --mode 0

To train a GATconv based model, run the following command,

python main.py --batch_size 32 --max_epochs 10 --config gat --max_sent_len 50 --encoder 4 --mode 0

To test the accuracy of the model on the out of domain test set, run the following command,

For the LSTM model,

python main.py --batch_size 1024 --encoder 0 --model_file model_lstm.t7 --mode 1

For the CNN model,

python main.py --batch_size 1024 --encoder 1 --model_file model_cnn.t7 --mode 1

For the Bert model,

python bert_classifier.py --batch_size 4 --model_file model_bert.t7 --max_seq_length 500 --max_sent_length 70 --mode 1

For the GCN model,

python main.py --batch_size 32 --max_sent_len 50 --encoder 2 --model_file model_gcn.t7 --mode 1

For the GCN model with attention,

python main.py --batch_size 32 --max_sent_len 50 --encoder 3 --model_file model_gcn_attn.t7 --mode 1

For the GATconv model,

python main.py --batch_size 32 --max_sent_len 50 --encoder 4 --model_file model_gat_attn.t7 --mode 1

Baseline Results

Out of domain test set accuracy

Model Acc Prec Recall F1
CNN 67.5 67.5 67.5 67.4
LSTM 81.4 82.2 81.4 81.3
BERT 78.1 78.1 78.1 78.0
LSTM + GCN + Max Pool 85.0 85.9 85.0 85.1
LSTM + GCN + Max Pool + Semantic Adj 86.4 86.4 86.3 86.4
LSTM + GCN + Self Attn 86.6 87.1 86.9 86.9
LSTM + GCN + Self Attn + Semantic Adj 87.8 87.8 87.8 87.8
LSTM + GAT 86.1 86.2 86.1 86.1
LSTM + GAT + Semantic Adj 87.5 87.5 87.5 87.4
LSTM + GAT + 2 Attn Heads 88.6 89.1 88.9 88.9
LSTM + GAT + 2 Attn Heads + Semantic Adj 84.7 85.2 84.7 84.6
SoTA - 88.0 82.0

Results with a dev/test split based on news sources: This might be a more realistic split

For two classes Satire / Trusted

In domain dev set accuracy

Model Acc Prec Recall F1
CNN 96.82 96.84 96.62 96.73
LSTM 95.65 95.64 95.41 95.52
BERT 91.72 92.74 90.56 91.31
LSTM + GCN + Max Pool 98.08 98.12 97.89 98.02
LSTM + GCN + Max Pool + Semantic Adj 96.77 97.57 97.85 97.7
LSTM + GCN + Attn 98.27 98.05 98.42 98.22
LSTM + GCN + Attn + Semantic Adj 98.17 98.15 98.06 98.11
LSTM + GAT 98.36 98.44 98.12 98.29
LSTM + GAT + Semantic Adj 98.25 98.29 98.09 98.19
LSTM + GAT + 2 Attn Heads 98.44 98.44 98.34 98.39
LSTM + GAT + 2 Attn Heads + Semantic Adj 98.02 98.01 97.9 97.95

Out of domain test set 1 accuracy

Model Acc Prec Recall F1
CNN 67.5 67.79 67.5 67.37
LSTM 81.11 82.12 81.11 80.96
BERT 75.83 76.62 75.83 75.65
LSTM + GCN + Max Pool 85.83 86.16 85.83 85.8
LSTM + GCN + Max Pool + Semantic Adj 83.89 84.73 83.89 83.79
LSTM + GCN + Attn 85.27 85.59 85.27 85.24
LSTM + GCN + Attn + Semantic Adj 85.56 85.57 85.56 85.55
LSTM + GAT 86.39 86.44 86.38 86.38
LSTM + GAT + Semantic Adj 85.27 85.31 85.27 85.27
LSTM + GAT + 2 Attn Heads 84.72 85.65 84.72 84.62
LSTM + GAT + 2 Attn Heads + Semantic Adj 86.94 87.04 86.94 86.94
SoTA - 88.0 82.0

Out of domain test set 2 accuracy

Model Acc Prec Recall F1
CNN 91.13 91.28 91.13 91.12
LSTM 91.53 91.54 91.53 91.53
BERT 83.46 83.56 83.46 83.45
LSTM + GCN + Max Pool 92.6 92.61 92.59 92.59
LSTM + GCN + Max Pool + Semantic Adj 89.73 90.57 89.73 89.68
LSTM + GCN + Self Attn 91.26 91.99 91.26 91.22
LSTM + GCN + Self Attn + Semantic Adj 92.4 92.53 92.39 92.39
LSTM + GAT 94.2 94.21 94.2 94.19
LSTM + GAT + Semantic Adj 92.6 92.69 92.59 92.59
LSTM + GAT + 2 Attn Heads 89.66 90.37 89.67 89.62
LSTM + GAT + 2 Attn Heads + Semantic Adj 92.86 93.06 92.87 92.86

For four classes Satire, Hoax, Propaganda and Trusted

In domain dev set accuracy

Model Acc Prec Recall F1
CNN 96.48 96.41 96.18 96.28 / 96.48
LSTM 88.75 88.67 88.11 88.35 / 88.75
BERT 95.07 94.81 94.57 94.68 / 95.07
LSTM + GCN + Max Pool 96.76 96.61 96.58 96.59 / 96.76
LSTM + GCN + Max Pool + Semantic Adj
LSTM + GCN + Attn 97.57 97.25 97.63 97.43 / 97.57
LSTM + GCN + Attn + Semantic Adj
LSTM + GAT 97.73 97.9 97.36 97.62 / 97.28
LSTM + GAT + Semantic Adj
LSTM + GAT + 2 Attn Heads 97.8 97.69 97.74 97.71 / 97.82
LSTM + GAT + 2 Attn Heads + Semantic Adj
SoTA - - - 91.0

Out of domain test set 2 accuracy

Model Acc Prec Recall F1
CNN 54.03 54.5 54.03 52.6 / 54.03
LSTM 55.06 58.88 55.06 52.5 / 55.05
BERT 55.56 57.45 54.86 54.0 / 54.87
LSTM + GCN + Max Pool 65.0 66.75 64.84 63.79 / 65.0
LSTM + GCN + Max Pool + Semantic Adj
LSTM + GCN + Attn 67.08 68.6 67.0 66.42 / 67.08
LSTM + GCN + Attn + Semantic Adj
LSTM + GAT 65.5 69.45 65.33 63.83 / 65.51
LSTM + GAT + Semantic Adj
LSTM + GAT + 2 Attn Heads 66.94 68.05 66.86 66.37 / 66.95
LSTM + GAT + 2 Attn Heads + Semantic Adj
SoTA - - - 65.0

For more structured results, refer to the tables in the paper. The following results are for document classification when applied to non-fake news domain.

Document classification

AG News (4 news categories)

Model Acc Test Error Rate
GAT 89.61 10.39
GAT + 2 Attn Heads 89.72 10.28
SOTA 5.01

IMDB (2 sentiment categories)

Model Acc Test Error Rate
GAT
GAT + 2 Attn Heads
SOTA 4.6

DBPedia (14 ontology categories)

Model Acc Test Error Rate
GAT 99.13
GAT + 2 Attn Heads
SOTA 0.80

If you find this work useful in your research, please consider citing the paper using following bibtex:

Bibtex

If you found this work or code useful for your research, please cite us!

@inproceedings{vaibhav-etal-2019-sentence,
    title = "Do Sentence Interactions Matter? Leveraging Sentence Level Representations for Fake News Classification",
    author = "Vaibhav, Vaibhav  and
      Mandyam, Raghuram  and
      Hovy, Eduard",
    booktitle = "Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)",
    month = nov,
    year = "2019",
    address = "Hong Kong",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-5316",
    doi = "10.18653/v1/D19-5316",
    pages = "134--139",
    abstract = "The rising growth of fake news and misleading information through online media outlets demands an automatic method for detecting such news articles. Of the few limited works which differentiate between trusted vs other types of news article (satire, propaganda, hoax), none of them model sentence interactions within a document. We observe an interesting pattern in the way sentences interact with each other across different kind of news articles. To capture this kind of information for long news articles, we propose a graph neural network-based model which does away with the need of feature engineering for fine grained fake news classification. Through experiments, we show that our proposed method beats strong neural baselines and achieves state-of-the-art accuracy on existing datasets. Moreover, we establish the generalizability of our model by evaluating its performance in out-of-domain scenarios. Code is available at https://github.com/MysteryVaibhav/fake{\textbackslash}{\_}news{\textbackslash}{\_}semantics.",
}

About

Code for the paper "Do Sentence Interactions Matter ? Leveraging Sentence Level Representations for Fake News Classification"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%