Structure-based virtual screening for PD-L1 dimerizers is boosted by inactive-enriched machine-learning models exploiting patent data
Based on Patent Data
Powered by DeepCoy Generator »
Table of Contents
We hypothesise that applying the latest advances observed in studies based on other targets will lead to highly accurate target-specific MLSFs for PDL1. For instance, a large number of decoys (assumed inactives) in the training set boosts SBVS performance of MLSFs, but this has never been investigated for PDL1. Thus, it is not known if training should be carried out with actives only, or supplementing the latter with experimentally validated inactives, property-matched decoys or random property-unmatched decoys. Likewise, regression-based MLSFs are still to be applied to PDL1 despite the dependent variable to predict, pIC50, being real-valued. This is probably due to the most popular SBVS benchmarks not having, by contrast, employed real-valued potency to evaluate performance, but only sets of actives and decoys with binary classification metrics. As a real-valued variable contains more information than any dichotomised version of that variable, it stands to reason that regression models should perform better than classification models, other things being equal. We will thus evaluate regression models that also exploit the information about the chemical diversity of inactives, which we call inactive-enriched regression-based ML SFs. Another novel aspect of our study is investigating which combinations of featurisation schemes and supervised learning algorithms are most predictive for SBVS on PDL1.
To get a local copy up and running make sure that you have installed Anaconda on your machine. If not check the link of installation: https://docs.anaconda.com/anaconda/install/index.html
This script is supported for Linux. It has been tested on the following systems:
- Linux: Ubuntu 20.04
- Create an environment and install all the dependencies with a Python version 3.6
conda env create -f requirement.yml python=3.6
when the installation is done activate the environment
conda activate pdl1_sbvs
git clone https://github.com/sawsimeon/MLSF-PDL1.git
cd MLSF-PDL1
These test codes require 157 and 163 seconds
python script/DeepCoys.py
python script/True_Inactives.py
Selected SFs, including GRID SVM SF build from training actives + RandomDecoys and also training actives + TrueInactives were saved as pickle files in here. The notebook folder contains jupyter notebooks for obtaining the PR-AUC and EF1% on these two test set TrueInactives and DeepCoys. We have also added these SFs trained on all actives and these same inactives + script to generate features for other docked complexes. This is to be able to use the SFs on other docked molecules. Please see the data folder.
We have pre-calculated features that were utilized to build target specific machine learning scoring functions. Due to the limited data size allowed on GitHub, we have uploaded our dataset to Zenodo. for public access.
Distributed under the MIT License. See LICENSE
for more information.