-
Notifications
You must be signed in to change notification settings - Fork 2
Home
In this Wiki I'll walk through the genome annotation process for Portulaca amilis, which was part of a recently submitted manuscript
Gilman IS, Moreno-Villena J, Lewis ZR, Goolsby EW, Edwards EJ. submitted. Gene co-expression networks reveal orthology among multiple photosynthetic pathways in Portulaca
that investigated the two photosynthetic systems of Portulaca (C4 and CAM).
All transcription-related analyses can be found in this repo, which walks through everything from the de novo transcriptome assembly of P. oleracea through network analyses and cis-regulatory element detection and enrichment analysis in the P. amilis genome.
I'll be using large portions of previous walk-throughs (e.g., Ya Yang's transcriptome assembly pipeline, Daren Card's MAKER annotation pipeline) and assume anyone reading/using this has a fair amount of comfort with the command line because there will inevitably be slight differences in commands and directory hierarchies across users and platforms. Although vignettes on genome annotation exists already, most only show the short, successful path taken. I hope that this Wiki helps clarify common issues faced by users who are new to these software, file formats, etc.
Many of the genomics tools used for genome annotation are pipelines or wrappers that leverage other software, often requiring specific versions or builds, making these packages finicky at best and unusable at worst. To alleviate some of headaches associated with integrating all of these pieces of software I'm going to create multiple conda
environments as I move through isolated chunks of analyses to manage packages; so when things inevitably break, the problems will be contained, traceable, and more easily rebuilt or removed. I'll include an appendix for each environment's specs and put yaml
files to recreate these in the repo. In addition to using conda
to manage my environment, I'll take advantage of some software already installed on Yale HPC's Farnam Cluster. I've found that using these installations addresses issues where multiple pieces of software wrapped in a single package attempt to use different, incompatible versions of the same dependencies or look for dependencies in incorrect directories. All other software is installed in a directory called apps
.