Skip to content

Latest commit

 

History

History
37 lines (19 loc) · 1.1 KB

README.md

File metadata and controls

37 lines (19 loc) · 1.1 KB

#go-model README File

This is a library for building gene ontology support vector classifiers from protein domain scores and then using them to predict function of candidate proteins.

#Objective

Identification of enzymes by sequence homology tends to result in a signal to noise problem. Determining which candidates are genuine functional homologs and which are false positives can be difficult.

The go_preprocess script is designed to use HMMER to search a protein domain hmm database (Pfams are best known but others are possible) and save the scores.

The model_test script takes the protein hmm scores and existing gene ontology classifications to train support vector classifiers by grid search through a parameter space.

The go_prediction script takes the SVC generated by model building and predicts gene ontology based on the primary sequence.

#Requirements