DiffDockHPC is a fork of DiffDock, which adds support to run DiffDock on HPC systems using Singularity and Slurm.
DiffDockHPC has been developed to be part of a consensus docking protocol: ESSENCE-Dock.
For more details about DiffDock itself, we refer to the DiffDock Github and the Paper on arXiv.
DiffDockHPC current version matches to DiffDock v1.1 (DiffDock-L).
Note: If you update from DiffDockHPC v1.0, it is highly recommended to perform a clean install.
DiffDockHPC is also available for the original DiffDock v1.0 implementation. This version was used in the original ESSENCE-Dock paper.
In case you want to work with DiffDockHPC using the DiffDock 1.0 implementation, you can clone the project, and use git checkout DiffDockHPCv1.0
.
- Singularity
- Slurm (There is a --no_slurm mode, but using Slurm is highly recommended)
-
Clone the repository and navigate to it
git clone https://github.com/Jnelen/DiffDockHPC
cd DiffDockHPC
-
Run a test example to automatically download the Singularity image (~3 GB) and to generate the necessary cache look-up tables for SO(2) and SO(3) distributions. (This only needs to happen once and usually takes around 15 minutes).
The--no_slurm
flag is optional here, but makes it easier to track the progress.python inferenceVS.py -p data/1a0q/1a0q_protein_processed.pdb -l data/1a0q/ -out TEST -j 1 --no_slurm
Or if you have access to a GPU, you can also add the -gpu tag like this:
python inferenceVS.py -p data/1a0q/1a0q_protein_processed.pdb -l data/1a0q/ -out TEST -j 1 -gpu --no_slurm
You can also download the Singularity image manually:
wget --no-check-certificate -r "https://drive.usercontent.google.com/download?id=1TsbuhNWA74AHfIbKV5uh2lmEnD99VlCD&confirm=t" -O singularity/DiffDockHPC.sif
alternatively, you can build the singularity image yourself using:
singularity build singularity/DiffDockHPC.sif singularity/DiffDockHPC.def
The main file to use is inferenceVS.py
. It has the following options/flags:
-
-p
,-r
,--protein_path
: Path to the protein/receptor.pdb
file. -
-l
,--ligand
: The path to the directory of (separate)mol2
/sdf
ligand files. -
--protein_ligand_csv
: The path to a protein_ligand_csv file. Format and header should be like the following: complex_name,protein_path,ligand_description. -
-o
,--out
,--out_dir
: Directory where the output structures will be saved to. -
-j
,--jobs
: Number of jobs to use. -
-qu
,--queue
: On which node to launch the slurm jobs. The default value is the default queue for the user. Might need to be specified if there is no default queue configured. -
-m
,--mem
: How much memory to use for each job. The default value is4GB
. -
-gpu
,--gpu
: Use GPU resources. This will accelerate docking calculations if a compatible GPU is available. -
-c
,--cores
: How many cores to use for each job. The default value is1
when used with the GPU option enabled, otherwise it defaults to4
cores. -
-n
,--num_outputs
: How many structures to output per compound. The default value is1
. -
--remove_hs
: Remove the hydrogens in the final output structures. -
--no_slurm
: Don't use slurm to handle the resources. This will run all samples on 1 GPU. Other Slurm arguments such as the amount memory, time limit, ... will also be ignored. The amount of CPU cores will still be set. -
--config
: Path to the config file you want to use. Defaults todefault_inference_args.yaml
-
-h
,--help
: Show the help message and exit.
MIT