Skip to content

Latest commit

 

History

History
25 lines (18 loc) · 1.55 KB

README.md

File metadata and controls

25 lines (18 loc) · 1.55 KB

MMTF Fragment Search

This program is meant to enable comprehensive querying of the PDB utilizing a fragmentation of every structure (and each chain) looking for matches based on backbone geometry - initially independent of all other properties. Under the hood the Java Apache Spark API is used in conjunction with the mmtf-spark project.

Conceptually, this allows for a strictly spatial search of fragments that utilize similar geometry to accomplish important chemistry.

Window (fragment) size can be varied, and the only required input is a PDB file of the query fragment. The query fragment should be identical in length to the window size chosen.

Example Output Example output of top 1000 hits

Planned improvements:

  • Improve scoring metric and add statistics that guide a threshold or cutoff
  • Output results in more parsable format (CSV)
  • Make executable build
  • Add sequence motif filter (regex for fragments)
  • Add filter to only let a PDB ID occur once (currently implemented in python PML generating script)

Potential future improvements:

  • GUI
  • Automated checking of pre-fragmented HADOOP sequence files
  • Automated updating of HADOOP sequence files with additions/changes to PDB
  • Inline creation of PML script to open results in PyMOL (currently implemented with a separate python script)

Many thanks to @lafita for all his work on this project and the UCSD actively developing mmtf-spark. This software is in development and as such is provided as is, with no guarantees.