FusionDTI

Introduction

FusionDTI utilises a Token-level Fusion module to effectively learn fine-grained information for Drug-Target Interaction Prediction. In particular, our proposed model uses the SELFIES representation of drugs to mitigate sequence fragment invalidation and incorporates the structure-aware (SA) vocabulary of target proteins to address the limitation of amino acid sequences in structural information, additionally leveraging pre-trained language models (PLMs) extensively trained on large-scale biomedical datasets as encoders to capture the complex information of drugs and targets.

Framework

Installation Guide

Clone this Github repo and set up a new conda environment.

# create a new conda environment
$ conda create --name FusionDTI python=3.8
$ conda activate FusionDTI

# install requried python dependencies
$ conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
$ pip install transformers
$ pip install wandb

# clone the source code of FusionDTI
$ git https://github.com/ZhaohanM/FusionDTI.git
$ cd FusionDTI

Datasets

All data used in FusionDTI are from public resource: BindingDB [1], BioSNAP [2] and Human [3]. The dataset can be downloaded from here.

Train

For the experiments with FusionDTI, you can directly run the following command. The dataset could either be BindingDB, Biosnap, and Human.

$ python main_token.py --dataset BindingDB

Inference

After training the FusionDTI model, the best saved model is used to inference a single drug and target pair. In visualize_attention.ipynb, we provide the function of entering protein and drug sequences to visualise attention weights.

$ python attention.py --dataset BindingDB

How to obtain the structure-aware sequence of protein?

The structure-aware sequence of protein is based on 3D structure file (.cif) using Foldseek from the AlphafoldDB database. SaProt provides a function to convert a protein structure into a structure-aware sequence. The function calls the foldseek binary file to encode the structure. You can download the binary file from here and place it in the utils folder.

The following three steps are the obtainment process:

Step 1: If you do not have protein structure files, you will need to obtain them from the AlphafoldDB database via the UniProt IDs on the UniProt website. The UniProt IDs are then saved as a comma-delimited text file.

Step 2: Retrieve protein structure files from AlphafoldDB through corresponding UniProt IDs.

$ python get_alphafold.py

Step 3: The structure-aware protein sequences are obtained with 3D structure files (cif).

$ python generate_stru_seq.py

How to obtain SELFIES of drug?

Install the Python package that converts SMILES strings to SELFIES strings.

$ pip install selfies 
$ pip install pandarallel

Run the following code to generate SELFIES based on your SMILES.

$ python generate_selfies.py

Results

Citation

Please cite our paper if you find our work useful in your own research.

@inproceedings{meng2024fusiondti,
title={Fusion{DTI}: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction},
author={Zhaohan Meng, Zaiqiao Meng, Ke Yuan and Iadh Ounis},
booktitle={ICML 2024 AI for Science Workshop},
year={2024},
url={https://openreview.net/forum?id=SRdvBPDdXB}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FusionDTI

Introduction

Framework

Installation Guide

Datasets

Train

Inference

How to obtain the structure-aware sequence of protein?

How to obtain SELFIES of drug?

Results

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
dataset		dataset
image		image
utils		utils
README.md		README.md
attention.py		attention.py
generate_selfies.py		generate_selfies.py
generate_stru_seq.py		generate_stru_seq.py
get_alphafold.py		get_alphafold.py
main_non_pre_encoded.py		main_non_pre_encoded.py
main_token.py		main_token.py
visualize_attention.ipynb		visualize_attention.ipynb

ZhaohanM/FusionDTI

Folders and files

Latest commit

History

Repository files navigation

FusionDTI

Introduction

Framework

Installation Guide

Datasets

Train

Inference

How to obtain the structure-aware sequence of protein?

How to obtain SELFIES of drug?

Results

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages