contactmaps/
contains the nanoHUB tool package that helped us calculate the protein contact mapsDataset_Files/
contains various CSV files that were used to construct the dataset, these can be largely ignored as a pickled file of the dataset is providedDataset_Files/AlphaFold_Proteins
contains all the pdb files downloaded from AlphaFoldDataset_Files/Baseline_Models
&Dataset_Files/Enhanced_Models
contain all our trained classification and regression models in the form of .joblib filesDataset_Files/Feature_Selection
contains all the numpy files regarding the feature selection processDataset_Files/Neural_Networks
contains the model checkpoint and its average train and validation lossesDataset_Files/Protein_Graph_Data
cotains all the files needed to construct the protein graphs and the protein graphs themselvesDataset_Files/Training_Test_Sets
contain the sets used in the training and testing phases for each modelMolecular_Functions_Embedding_Model_&_Files/
contains two notebooks. one that was used to create the dataset used by the embedding model, and the embedding model itself, as well as the protein embeddings calculcated,protein_embeddings.pkl
, in the form of a pickled python dictionaryMolecular_Functions_Embedding_Model_&_Files/Dataset_Files/
follows the same structure discussedDissertation
contains the Overleaf project used to create our dissertationDiagrams/
contains all diagrams used in our dissertation and Streamlit app in their draw.io formatInterim_Report
contains the Overleaf project used to create our reportMetrics/
contains all the metrics gathered from our trained models in the form of CSV filesR_Scripts/
contains the scripts that were used to calculate the amino acid and protein sequence descriptorsStreamlit_App/
contains anything to do with our web appDataset_Creation_&_Exploration.ipynb
was the Jupyter notebook used to bring together the various CSV files to create our dataset and split it into training and test setsClassification_Baseline_Models.ipynb
,Regression_Baseline_Models.ipynb
,Classification_Enhanced_Models.ipynb
&Regression_Enhanced_Models.ipynb
were the Jupyter Notebooks used to train and test our various modelsDTIs_Classification_NN
andDTIs_Regression_NN
were the Jupyter notebooks used to train and test the neural networks for DTI predictionamino_acid_features.py
,drug_features.py
,protein_features.py
,extract_dtis.py
,models_utils.py
&utils.py
contain helper functions that were used to create the various CSV files, the dataset and the models
- Python: 3.9.16
- PyTorch: 1.13.0
- PyTorch Geometric: 2.1.0
- Cuda Version: 1.17.0
- Packages: listed in
requirements.txt
- Tested on Windows 11
We would suggest the creation of an anaconda virtual environment and then running:
pip3 install -r requirements.txt
pip3 install torch==1.13.0 torchvision torchaudio
or conda install pytorch==1.13.0 torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip3 install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric==2.1.0 -f https://data.pyg.org/whl/torch-1.13.0+cu117.html