Skip to content

DavidMuller/genome_annotation

Repository files navigation

CSE 182 Final Project--David Muller, Nirati Gautam, Joanna Nguyen, Yunsup Jung

-------------------------------------------------------------------------------

Our annotation program is called annotate.py:

All output will be saved in a folder called final_output, in the directory
you run annotate.py in.

Its -input, -p, and -out options are required.

-input is fasta file with the DNA sequences you want to analyze.
-p is the protein data base (made with Blast+'s makeblastdb tool).
-out specifies the format you want your results in--an argument of
  'g' yields GFF output, 'a' is a multi-fasta file of predicted protein
  sequences.


Here is an example of a call to annotate.py: 

python annotate.py  -input our_contigs.fasta -p Chlre4_best_proteins.fasta -out g


-------------------------------------------------------------------------------

A little about our implementation:

We only consider a Blastx hit on a protein significant if its e-value is less 
than .0001.

For every significant protein hit generated by Blastx on our contig, we partition
the corresponding the part of the contig.  We extend the partition 1000 bases
before the starting point of the Blastx hit, and 1000 bases after the endpoint
of the hit.

That partitioned contig and its corresponding protein are then passed to Exonerate
which generates a GFF file with hints about finer gene structure.  This raw exonerate
GFF output is modified slightly to be compatible with Augustus.

The modified GFF file is then passed to Augustus to finish analysis.  

Outputted in the folder 'final_output' are files for every contig in the input.
The files are either a GFF file with all the predicted genes on that contig,
or a multi-fasta file with all the predicted protein sequences from that gene.

-------------------------------------------------------------------------------


About

Final project for cse182.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages