multiGenomicContext is a python + R script that plot the genomic context of a protein on the genome that you want (or many genomes). You only needs two things: a fasta file and a list of your gbk to find the genomic context (and the gbk's too).
- Python >= 2.7 with the module Biopython
- blastp binary
- R (tested in 3.3.1), with modules:
- ggplot2
- genoPlotR
multiGenomicContext have a minimal use:
python -f protein.fasta -l gbklist.txt
and a complete use:
python -f protein.fasta -l gbklist.txt -u 4 -d 4 -e 1e-5 -i 85 -a 75
where the options are:
- -f: The protein sequence in fasta format (also can be a multifasta of proteins).
- -l: The list of gbk (if you have one, also put that name in a file)
- -u: Number of genes to put the genomic context in upstream search (default: 4)
- -d Number of genes to put the genomic context in downstream search (default: 4)
- -e E-value for blastp search (default: 1e-5)
- -i Identity % of the alignment on blastp results to consider the gene exists on the genome (default: 85)
- -a Alignment length (%) between gene and the match for blastp search to consider the gene "exists" on the genome (default 75)
The simple way is write a txt name by name. Or do in a terminal:
ls -1 *.gbk > myGbkList.txt
- multiGenomicContext search genes on the gbk because the cds are ordered, but this true only for one chromosome assembly, for gbk files where two or more contigs exists, it's show genomic context for the same contig of the gene.
- We strongly recommend use one software to annotate all gbk's.
- finally, no genomic context is printed if the gene is missing in the gbk.