NOTE: the models
directory (which contains the trained machine learning models) is too big to be on GitHub and needs to be recreated when needed.
algorithms
: the three original algorithms (clique, kdown, McSplit), and a combination of McSplit and clique called Fusion.data
: two databases of graphs for MCS and SIP algorithms, and an AIDS dataset, which was never used.gnuplot
: gnuplot scripts and generated cumulative plotsgraph_stats
: a graph feature extractor taken from Portfolios of subgraph isomorphism algorithms (Kotthoff, McCreesh and Solnon, LION 2016)results
: all (mostly CSV) data that was generated during this project..mcs.
and.sip.
denote which database the data came from.- a number between dots denotes the labelling percentage.
both.labels
means that both vertices and edges were labelled.vertex.labels
means that only vertex labels were labelled.association
files record features of the association graph.clique
files record the number of search nodes, runtime, and the size of the discovered MCS, for the clique algorithm.kdown
,mcsplit
, andmcsplitdown
do the same thing for other algorithms.costs.csv
records the feature extraction costs, for all graphs.filtered_instances
is a random sample of MCS instances (30,000).filtered_instances2
is a random sample offiltered_instances
(10,000).filtered_instances_one_filename
isfiltered_instances
without the second of the two filenames.fusion1
is the Fusion algorithm that switches after one decision.fusion2
is the Fusion algorithm that switches after two decisions.labels.csv
: each line corresponds to a different graph of the MCS instances. For each distinct label in the 33% labelling scheme, we record the number of vertices with that label.mcs_features.csv
andsip_features.csv
record the features for unlabelled MCS and SIP instances (ratio features are calculated later).mcs_features_individual.csv
andsip_features_individual.csv
record features of each individual graph separately.mcs_instances
andsip_instances
contain the complete lists of instances, for both graph databases.unlabelled.csv
,vertex_labels.csv
, andboth_labels.csv
contain summaries of the data, used for plotting cumulative plots
text
: the dissertation, a short status report, and three sets of slides for different presentations.video
: everything related to the video describing this project.