Overview

This study presents a replication attempt in C++ of the study by Phoophakdee Benjarath, and Mohammed J. Zaki., on Genome-scale disk-based suffix tree indexing. Our goal was to replicate the four-stage of researchers from scratch and evaluate their original findings. However, we encountered challenges due to the limited guidance for its implementation. As a result, instead of a one-to-one replication of the study, we decided to implement our interpretation of the algorithm, this solution while functional and able to replicate the algorithm’s main stages, had its own limitations and was unable to be utilized for genome-scale sequences. This experience showcases the importance of thoroughness in reporting research to ensure the reproducibility and reliability of findings.

An in-depth technical view can be accessed at ./Project Report/Technical_Report.pdf, we provide some main figures below for ease of access:

Genetic sequence Suffix Tree example:

Algorithm stages:

Suffix Tree and Node class variables:

Algorithm 1, Variable length prefix creation:

Algorithm 2, Sequence partitioning:

Algorithm 3, Suffix tree construction:

Algorithm 4, Querying partition tree for prefix:

Algorithm 5, Tree merging:

File/Folder descriptions

Files

*.h files are the header files for the corresponding .cpp files to be included in the main.cpp file.
file_utils contains functions for creating, deleting folders to store the output files of each stage and for partitioning the input sequence into smaller files.
node contains the class constructor and respective methods for the nodes of the suffix tree.
suffix_tree contains the class constructor and respective methods for the suffix tree.
prefix_count contains the methods required for the variable lenght prefix stage.
suffix_links contains the methods required for the suffix links stage.
run_pipeline contains the four stages of the algorithm in distinct functions and the main function that calls them in the right order.
TRELLIS.exe is the executable file of the algorithm

Folders

./Project Report contains the technical report of the project.
./input contains the input sequence files for the algorithm.
./temp_prfx contains the identified variable length prefixes.
./temp_partition contains the partitioned input sequence files.
./temp_trees contains the prefixed suffix sub-trees of the partitioned input for each prefix.
./temp_trees/merged_trees contains the merged suffix trees of the partitioned input for each prefix.
./temp_trees/final_trees contains the merged trees with the addition of suffix links.

Instructions on running the algorithm

Make sure an ./input directory exists in the same directory as the executable file TRELLIS.exe.
The Sars Cov-2 genome sequence should be in the ./input directory with the name NC_045512V2.fa.
Run the executable file TRELLIS.exe.

If you wish to compile the algorithm the command utilized is:

g++ -fdiagnostics-color=always -g -ggdb -pedantic-errors -Wall -Wextra -Wsign-conversion -Werror -std=c++20 *.cpp -o /home/cinnamonk/projects/Trelis_repo/TRELLIS.exe

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
Project Report		Project Report
input		input
.gitignore		.gitignore
README.md		README.md
TRELLIS.exe		TRELLIS.exe
file_utils.cpp		file_utils.cpp
file_utils.h		file_utils.h
node.cpp		node.cpp
node.h		node.h
partitioning.cpp		partitioning.cpp
partitioning.h		partitioning.h
prefix_count.cpp		prefix_count.cpp
prefix_count.h		prefix_count.h
run_pipeline		run_pipeline
run_pipeline.cpp		run_pipeline.cpp
suffix_links.cpp		suffix_links.cpp
suffix_links.h		suffix_links.h
suffix_tree.cpp		suffix_tree.cpp
suffix_tree.h		suffix_tree.h
utils.cpp		utils.cpp
utils.h		utils.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

File/Folder descriptions

Files

Folders

Instructions on running the algorithm

About

Releases

Packages

Contributors 2

Languages

KyriakosPsa/Disk-Based-Suffix-Tree

Folders and files

Latest commit

History

Repository files navigation

Overview

File/Folder descriptions

Files

Folders

Instructions on running the algorithm

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages