Requirements

Refer to SPEAR Library for well documented implemenation of this paper.

Requirements

This code has been developed with

python 3.6
numpy 1.17.4
torch 1.1.0

Data Description

The dataset directory contains dataset for the following 3 datasets:

IMDB

To download following datasets, please go to the following link (https://github.com/awasthiabhijeet/Learning-From-Rules) and keep them inside Data/ directory

MITR - Slot filling task (Source: https://groups.csail.mit.edu/sls/downloads/restaurant/)
YOUTUBE - Spam Classification task of youtube comments (Source: http://www.dt.fee.unicamp.br/~tiago//youtubespamcollection)
SMS - Spam classification task of text messages (Source: https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection)
CENSUS

data/IMDB (or any other data dir) consists following four pickle files

d_processed.p (d set: labeled data )
U_processed.p (U set: unlabeled data)
test_processed.p (test data)
NOTE U_processed.p for YOUTUBE and MITR is unavailable on GitHub due to larger size. You can download entire data dir from this link

Following objects are dumped inside each pickle file

x : feature representation of instances
- shape : [num_instances, num_features]
l : Class Labels assigned by rules
- shape : [num_instances, num_rules]
- class labels belong to {0, 1, 2, .. num_classes-1}
- l[i][j] provides the class label provided by jth rule on ith instance
- if jth rule doesn't cover ith instance, then l[i][j] = num_classes (convention)
- in snorkel, convention is to keep l[i][j] = -1, if jth rule doesn't cover ith instance
m : Rule coverage mask
- A binary matrix of shape [num_instances, num_rules]
- m[i][j] = 1 if jth rule cover ith instance
- m[i][j] = 0 otherwise
L : Instance labels
- shape : [num_instances, 1]
- L[i] = label of ith instance, if label is available i.e. if instance is from labeled set d
- Else, L[i] = num_clases if instances comes from the unlabeled set U
- class labels belong to {0, 1, 2, .. num_classes-1}
d : binary matrix of shape [num_instances, 1]
- d[i]=1 if instance belongs to labeled data (d), d[i]=0 otherwise
- d[i]=1 for all instances is from d_processed.p
- d[i]=0 for all instances in other 3 pickles {U,validation,test}_processed.p
r : A binary matrix of shape [num_instances, num_rules]
- r[i][j]=1 if jth rule was associated with ith instance
- Highly sparse matrix
- r is a 0 matrix in all the pickles except d_processed.p
- Note that this is different from rule coverage mask "m"
- This matrix defines the coupled rule,example pairs.

Usage

Run respective .sh files to train the model
To run semi-supervised model of youtube - run tr_youtube.sh
Each sh file contains calls to various combinations of loss functions.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
census		census
datasets		datasets
mitr		mitr
recommend		recommend
reef		reef
sms		sms
trec		trec
youtube		youtube
.gitignore		.gitignore
.ss_generic.py.swp		.ss_generic.py.swp
README.md		README.md
cage.py		cage.py
census_ss.py		census_ss.py
combine_lfs.py		combine_lfs.py
deep_net.py		deep_net.py
find_labels.py		find_labels.py
generic_rand_sub_selection.py		generic_rand_sub_selection.py
generic_sup_sub_selection.py		generic_sup_sub_selection.py
generic_unsup_sub_selection.py		generic_unsup_sub_selection.py
imdb_rand_sub_selection.sh		imdb_rand_sub_selection.sh
latest_details.txt		latest_details.txt
logistic_regression.py		logistic_regression.py
losses.py		losses.py
meta_layers.py		meta_layers.py
model.py		model.py
preprocess_recsys.ipynb		preprocess_recsys.ipynb
preprocess_recsys.py		preprocess_recsys.py
reduce_validation_data.py		reduce_validation_data.py
rewt_generic.py		rewt_generic.py
rewt_ss_generic.py		rewt_ss_generic.py
sms_rewt.py		sms_rewt.py
sms_ss.py		sms_ss.py
spam_random.py		spam_random.py
ss.ipynb		ss.ipynb
ss_audit.py		ss_audit.py
ss_generic.py		ss_generic.py
ss_iono_noise_induce.py		ss_iono_noise_induce.py
ss_kl_generic.py		ss_kl_generic.py
synthetic_all.py		synthetic_all.py
synthetic_semisupervised.py		synthetic_semisupervised.py
tr_audit.sh		tr_audit.sh
tr_census.sh		tr_census.sh
tr_generic_iono.sh		tr_generic_iono.sh
tr_imdb.sh		tr_imdb.sh
tr_imdb_reef.sh		tr_imdb_reef.sh
tr_iono.sh		tr_iono.sh
tr_sms.sh		tr_sms.sh
tr_youtube.sh		tr_youtube.sh
utils_rec.py		utils_rec.py
weighted_cage.py		weighted_cage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Refer to SPEAR Library for well documented implemenation of this paper.

Requirements

Data Description

data/IMDB (or any other data dir) consists following four pickle files

Following objects are dumped inside each pickle file

Usage

About

Contributors 2

Languages

ayushbits/Semi-Supervised-LFs-Subset-Selection

Folders and files

Latest commit

History

Repository files navigation

Refer to SPEAR Library for well documented implemenation of this paper.

Requirements

Data Description

data/IMDB (or any other data dir) consists following four pickle files

Following objects are dumped inside each pickle file

Usage

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages