Directory Structure

contactmaps/ contains the nanoHUB tool package that helped us calculate the protein contact maps
Dataset_Files/ contains various CSV files that were used to construct the dataset, these can be largely ignored as a pickled file of the dataset is provided
Dataset_Files/AlphaFold_Proteins contains all the pdb files downloaded from AlphaFold
Dataset_Files/Baseline_Models & Dataset_Files/Enhanced_Models contain all our trained classification and regression models in the form of .joblib files
Dataset_Files/Feature_Selection contains all the numpy files regarding the feature selection process
Dataset_Files/Neural_Networks contains the model checkpoint and its average train and validation losses
Dataset_Files/Protein_Graph_Data cotains all the files needed to construct the protein graphs and the protein graphs themselves
Dataset_Files/Training_Test_Sets contain the sets used in the training and testing phases for each model
Molecular_Functions_Embedding_Model_&_Files/ contains two notebooks. one that was used to create the dataset used by the embedding model, and the embedding model itself, as well as the protein embeddings calculcated, protein_embeddings.pkl, in the form of a pickled python dictionary
Molecular_Functions_Embedding_Model_&_Files/Dataset_Files/ follows the same structure discussed
Dissertation contains the Overleaf project used to create our dissertation
Diagrams/ contains all diagrams used in our dissertation and Streamlit app in their draw.io format
Interim_Report contains the Overleaf project used to create our report
Metrics/ contains all the metrics gathered from our trained models in the form of CSV files
R_Scripts/ contains the scripts that were used to calculate the amino acid and protein sequence descriptors
Streamlit_App/ contains anything to do with our web app
Dataset_Creation_&_Exploration.ipynb was the Jupyter notebook used to bring together the various CSV files to create our dataset and split it into training and test sets
Classification_Baseline_Models.ipynb,Regression_Baseline_Models.ipynb,Classification_Enhanced_Models.ipynb & Regression_Enhanced_Models.ipynb were the Jupyter Notebooks used to train and test our various models
DTIs_Classification_NN and DTIs_Regression_NN were the Jupyter notebooks used to train and test the neural networks for DTI prediction
amino_acid_features.py, drug_features.py, protein_features.py,extract_dtis.py, models_utils.py & utils.py contain helper functions that were used to create the various CSV files, the dataset and the models

Requirements

Python: 3.9.16
PyTorch: 1.13.0
PyTorch Geometric: 2.1.0
Cuda Version: 1.17.0
Packages: listed in requirements.txt
Tested on Windows 11

Build steps

We would suggest the creation of an anaconda virtual environment and then running:

pip3 install -r requirements.txt

pip3 install torch==1.13.0 torchvision torchaudio or conda install pytorch==1.13.0 torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

pip3 install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric==2.1.0 -f https://data.pyg.org/whl/torch-1.13.0+cu117.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manual.md

Manual.md

Directory Structure

Requirements

Build steps

Files

Manual.md

Latest commit

History

Manual.md

File metadata and controls

Directory Structure

Requirements

Build steps