-
Notifications
You must be signed in to change notification settings - Fork 0
DavidMuller/genome_annotation
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
CSE 182 Final Project--David Muller, Nirati Gautam, Joanna Nguyen, Yunsup Jung ------------------------------------------------------------------------------- Our annotation program is called annotate.py: All output will be saved in a folder called final_output, in the directory you run annotate.py in. Its -input, -p, and -out options are required. -input is fasta file with the DNA sequences you want to analyze. -p is the protein data base (made with Blast+'s makeblastdb tool). -out specifies the format you want your results in--an argument of 'g' yields GFF output, 'a' is a multi-fasta file of predicted protein sequences. Here is an example of a call to annotate.py: python annotate.py -input our_contigs.fasta -p Chlre4_best_proteins.fasta -out g ------------------------------------------------------------------------------- A little about our implementation: We only consider a Blastx hit on a protein significant if its e-value is less than .0001. For every significant protein hit generated by Blastx on our contig, we partition the corresponding the part of the contig. We extend the partition 1000 bases before the starting point of the Blastx hit, and 1000 bases after the endpoint of the hit. That partitioned contig and its corresponding protein are then passed to Exonerate which generates a GFF file with hints about finer gene structure. This raw exonerate GFF output is modified slightly to be compatible with Augustus. The modified GFF file is then passed to Augustus to finish analysis. Outputted in the folder 'final_output' are files for every contig in the input. The files are either a GFF file with all the predicted genes on that contig, or a multi-fasta file with all the predicted protein sequences from that gene. -------------------------------------------------------------------------------
About
Final project for cse182.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published