This repository provides the profile hidden Markov model (pHMM) for flavin-dependent tryptophan halogenase (Trp-FDH) and Python script for filtering result of pHMM search and leave-one-out cross-validation.
- Python version 2.7.12 or higher with argparse library
- HMMER3.0 package version 3.1b2 or higher
- Clustal Omega version 1.2.4 or higher
- To perform the pHMM search, hmmscan in HMMER3.0 package is used with "--domtblout" option.
- A query should be included as protein sequences in the FASTA format file.
$ hmmscan --domtblout output_file_name.domtblout pHMM/Trp_FDH.hmm query.fasta
- To filter the .domtblout file,
is used. - According to the threshold for e-value and pHMM model coverage, the search result is filtered.
- To filter the .domtblout file,
$ python -i example/example_input.domtblout -o example/example_output.domtblout -e e-value -c model_coverage
- To perform the LOOCV,
is used. - The input file of this code is in
. - Training data and test data are generated as many as the number of sequences in the input file, respectively.
- To build pHMMs of training data, multiple sequence alignment (MSA) is performed by using Clustal Omega (v1.2.4).
- Using the results of MSA (.sto file), pHMM models were constructed by using hmmbuild in HMMER3.0 package.
- The pHMM search with the pHMM or training data and test set is executed.
- The pHMM search results are summarized in terms of evalue (avg. std., and median of evalue).
- To perform the LOOCV,
$ python --fasta pHMM/Trp-FDH.fasta