DMPmetal is a deep learning-based method for predicting metal binding sites from amino acid sequences. This uses a pretrained protein language model (pLM) to embed the target sequences and to provide the features for simple feed-forward classifier. From a user perspective, the input to the model is a single amino acid sequence, and the output probabilities relate to each of the 29 CHEBI metal codes. This model was ranked 1st in the UniProt Metal Binding Site Machine Learning Challenge held in 2022, and was trained on the organizers’ provided NEG_TRAIN and POS_TRAIN_FULL datasets, based on curated UniProt annotations (http://insideuniprot.blogspot.com/2022/02/the-uniprot-metal-binding-site-machine.html).
First ensure that ansible is installeded on your system, then clone the github repo.
pip install ansible
git clone https://github.com/psipred/DMPmetal.git
cd DMPmetal/ansible_installer
Next edit the the config_vars.yml to reflect where you would like DMPmetal and its underlying data to be installed.
You can now run ansible as per
ansible-playbook -i hosts install.yml
You can edit the hosts file to install s4pred on one or more machines. Ansible installation creates a python virtualenv called dmpmetal_env. You activate this with
source [app path]/dmpmetal_env/bin/activate
If you're using a virtualenv to install Torch you may find you need to add the paths to virtualenv versions of cudnn/lib/ and nccl/lib/ to your LD_LIBRARY_PATH
DMPmetal requires pytorch and the flash-attn packages. At the time of writing flash-attn most easily installs with pytorch 2.0.1 and cuda 11.8. Though you should be able to compile it against later versions of both. Python dependencies can be installed with
pip install -r requirements/requirements_torch.txt --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements/requirements.txt
The weights file should be downloaded and unpackaged from:
http://bioinfadmin.cs.ucl.ac.uk/downloads/DMPmetal/metalpred.pt.gz
The standard usage is:
python pytorch_dmp2e2e_pred.py -i /path/to/file.fasta
python pytorch_dmp2e2e_pred.py -i 5pcy.fasta
-i INPUT Specify path to fasta file input.
-d DEVICE Hardware to run on. Options: 'cpu', 'cuda'. default is 'cpu'
By default, DMPmetal returns output to stdout. You can capture this to a file with a typical redirection. A typical output might be:
FASTA HEADER ID, CHEBI CODE, RESIUDE NUMBER, PROBABILITY
PDB|5PCY CHEBI:25213 11 0.01
PDB|5PCY CHEBI:23378 37 0.99
PDB|5PCY CHEBI:29036 37 0.02
PDB|5PCY CHEBI:60240 37 0.03
PDB|5PCY CHEBI:24875 37 0.02
Only binding residues that pass a 0.01 cutoff are returned. Metal binding residues are predicted for the following CHEBI codes:
CHEBI:48775 : Cd2+
CHEBI:29108 : Ca2+
CHEBI:48828 : Co2+
CHEBI:49415 : Co3+
CHEBI:23378 : Cu cation
CHEBI:49552 : Cu+
CHEBI:29036 : Cu2+
CHEBI:60240 : divalent metal cation
CHEBI:190135 : di-μ-sulfido-diiron
CHEBI:24875 : iron cation
CHEBI:29033 : Fe2+
CHEBI:29034 : Fe3+
CHEBI:30408 : iron-sulfur cluster
CHEBI:49713 : Li+
CHEBI:18420 : Mg2+
CHEBI:29035 : Mn2+
CHEBI:16793 : Hg2+
CHEBI:49786 : Ni2+
CHEBI:60400 : nickel-iron-sulfur cluster
CHEBI:47739 : NiFe4S4 cluster
CHEBI:29103 : K+
CHEBI:29101 : Na+
CHEBI:49883 : tetra-μ3-sulfido-tetrairon
CHEBI:21137 : tri-μ-sulfido-μ3-sulfido-triiron
CHEBI:29105 : Zn2+
CHEBI:177874 : NiFe4S5 cluster
CHEBI:21143 : Fe8S7 iron-sulfur cluster
CHEBI:60504 : iron-sulfur-iron cofactor
CHEBI:25213 : metal cation