Skip to content

Protein Secondary Structure Prediction with Neural Networks

License

Notifications You must be signed in to change notification settings

asjchen/secondary-folding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Protein Secondary Structure Prediction with Deep Learning

This is a deep learning architecture to predict secondary structure in proteins. The dataset, originally from the Protein Data Bank (PDB), contains amino acid sequence and structure information for roughly 6100 proteins.

Currently, we use a Bidirectional LSTM RNN architecture to solve the Q8 classification problem; for each amino acid in the protein sequences, we assign one of eight different labels:

  • alpha helix
  • beta strand
  • loop or irregular
  • beta turn
  • bend
  • 310-helix
  • beta bridge
  • pi helix

The neural network architecture yields a roughly 56% Q8 accuracy in testing. (Note: these scripts are not optimized for running on GPUs.)

Python Requirements

The required Python modules are in requirements.txt. You can install them with

pip install -r requirements.txt

Running the Scripts

To run the current version of the algorithm, you can run the following command:

python src/driver.py data/cullpdb+profile_6133.npy 

where the last argument is the location of the dataset. We use the publicly available dataset from Zhou, J. & Troyanskaya O. 2014. Currently, this data is accessible here: http://www.princeton.edu/~jzthree/datasets/ICML2014/

Add the [-c] flag to see the 8x8 confusion matrix of the labels on the validation data after each epoch of training. For more information about using the driver script, run the driver with the help flag:

python src/driver.py -h

About

Protein Secondary Structure Prediction with Neural Networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages