- Run
bash create_nega_posi_data.sh [-d dataset name] [-p positives] [-n negatives] [-t prediction targets] [-r test size rate]
- The default directory of positive labeled data is
data/np/positives
, negatives isdata/np/negatives
and targets isdata/np/targets
.
-
Run
python remove_words.py np
-
Run
python build_graph.py np
-
Run
python predict.py np
The result file is created as results/<dataset_name>_result.txt
The implementation of Text GCN in our paper:
Liang Yao, Chengsheng Mao, Yuan Luo. "Graph Convolutional Networks for Text Classification." In 33rd AAAI Conference on Artificial Intelligence (AAAI-19), 7370-7377
Python 2.7 or 3.6
Tensorflow >= 1.4.0
-
Run
python remove_words.py 20ng
-
Run
python build_graph.py 20ng
-
Run
python train.py 20ng
-
Change
20ng
in above 3 command lines toR8
,R52
,ohsumed
andmr
when producing results for other datasets.
-
/data/20ng.txt
indicates document names, training/test split, document labels. Each line is for a document. -
/data/corpus/20ng.txt
contains raw text of each document, each line is for the corresponding line in/data/20ng.txt
-
prepare_data.py
is an example for preparing your own data, note that '\n' is removed in your documents or sentences.
An inductive version of Text GCN is fast_text_gcn, where test documents are not included in training process.