Skip to content

SB Map features nucl2prot

Steve Bond edited this page Oct 15, 2015 · 6 revisions

--map_features_nucl2prot, -fn2p

Description

Transfer nucleotide feature annotations onto their corresponding protein sequences. The nucleotide and amino acid files must be separate.

SeqBuddy will throw a warning if it finds sequences in either file that are not also present in the other, but this can be silenced with the -q flag.

Examples

Input file 1: Mle-Panxα4_cds.gb

LOCUS       Mle-Panxα4              1275 bp    DNA              UNA 02-JAN-2015
DEFINITION  cDNA and genomic - ML129317a.
ACCESSION   Mle-Panxα4
VERSION     Mle-Panxα4
KEYWORDS    .
SOURCE
  ORGANISM  . . .
            .
FEATURES             Location/Qualifiers
     TMD1            82..144
     TMD2            391..453
     TMD3            643..705
     TMD4            913..1005
ORIGIN
        1 atggttattg agctgctagc tggatacaaa ggtctgtccc cgtttaaaga cgcgactgtt
       61 gacgactcat gggaccaaat aaaccgatgt tacgtgttca tcgccatggt ggtgatgggt
      121 gctgtgacta caatgaggca atactctgga acattgattg catgtgacgg gttcacgaag
      181 ttccaccctc agtttgcaga agattactgc tggagcatag gaatgtacac ggtacgcgag
      241 gcctatgact tgcccagcag tatggttgca taccccggag tgataccctg ggatatgcct
      301 gcatgtgttc cacgtctcct gaagaacgga accaggacca aatgtggcag tgagaaggac
      361 gttatgccct cagagaaaat ctaccacttg tggtaccagt gggcaagttt ctacttctgg
      421 atagtggcta tactgtacta cgcgccgtat ataatgttca aacagttggg agggggagag
      481 tacaagcccc tgatcaagct actttgtctt gcgtctggat ctcctgaaca acagatgcag
      541 gacatccagg agcgtgtcgt caagtggctt ttcttcaggt ttaagaccta catattcgct
      601 aagggttact acgcgtggct acgtaaaaac agtttcagta tcgctatcgg cgtgacaaaa
      661 ttgtcctatc tcctgataac tatccttgtg ttctacttaa caggcttcat gttcgaatat
      721 ggctctaaca cgtggtaccg gtacggtgct gactggtacg gtaccagatt ctcctcgtac
      781 cacgaaacta acaactcaat cacactcaca aaggacatca tcttcccaaa gatggtagcg
      841 tgtgagatca agcgatgggg tccctcaggg attgaggttg agaccgctca gtgcgtactt
      901 gccccgaatg tgctctacca gtaccttttc ctctttactt ggtacctcct gatcgcggta
      961 ttcttcacta acctcatcag ttgtttcctc cacatttctg agatgttctt ctctaacggt
     1021 acgtacaaca ggatgataga tcaaggaatg ttgccagaca agcccagtta tcggtacgtc
     1081 ttcatgaaca ttggcgccgg tggcagagag atagtccaga ttctaacaga caattccaac
     1141 cccctcttgt ttagcaagat atttgacgat cttaccaatt tactaatcac tacttccaaa
     1201 aacgctgacg tcattgaaaa cctgtcgaag ttggattcct ccgtaattga actaggcagc
     1261 aaagactcaa tctaa
//

Input file 2: Mle-Panxα4_pep.fa

>Mle-Panxα4 ML129317a.
MVIELLAGYKGLSPFKDATVDDSWDQINRCYVFIAMVVMGAVTTMRQYSGTLIACDGFTK
FHPQFAEDYCWSIGMYTVREAYDLPSSMVAYPGVIPWDMPACVPRLLKNGTRTKCGSEKD
VMPSEKIYHLWYQWASFYFWIVAILYYAPYIMFKQLGGGEYKPLIKLLCLASGSPEQQMQ
DIQERVVKWLFFRFKTYIFAKGYYAWLRKNSFSIAIGVTKLSYLLITILVFYLTGFMFEY
GSNTWYRYGADWYGTRFSSYHETNNSITLTKDIIFPKMVACEIKRWGPSGIEVETAQCVL
APNVLYQYLFLFTWYLLIAVFFTNLISCFLHISEMFFSNGTYNRMIDQGMLPDKPSYRYV
FMNIGAGGREIVQILTDNSNPLLFSKIFDDLTNLLITTSKNADVIENLSKLDSSVIELGS
KDSI*

Usage

$: sb Mle-Panxα4_cds.gb Mle-Panxα4_pep.fa -fn2p

Output

LOCUS       Mle-Panxα4               425 aa                     UNK 01-JAN-1980
DEFINITION  Mle-Panxα4 cDNA and genomic - ML129317a.
ACCESSION   Mle-Panxα4
VERSION     Mle-Panxα4
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     TMD1            28..48
     TMD2            131..151
     TMD3            215..235
     TMD4            305..335
ORIGIN
        1 mviellagyk glspfkdatv ddswdqinrc yvfiamvvmg avttmrqysg tliacdgftk
       61 fhpqfaedyc wsigmytvre aydlpssmva ypgvipwdmp acvprllkng trtkcgsekd
      121 vmpsekiyhl wyqwasfyfw ivailyyapy imfkqlggge ykplikllcl asgspeqqmq
      181 diqervvkwl ffrfktyifa kgyyawlrkn sfsiaigvtk lsyllitilv fyltgfmfey
      241 gsntwyryga dwygtrfssy hetnnsitlt kdiifpkmva ceikrwgpsg ievetaqcvl
      301 apnvlyqylf lftwylliav fftnliscfl hisemffsng tynrmidqgm lpdkpsyryv
      361 fmnigaggre ivqiltdnsn pllfskifdd ltnllittsk nadvienlsk ldssvielgs
      421 kdsi*
//

Example 2

Input file 1: N-terminal_mrna.gb

LOCUS       Mle-Panxα4               200 bp    RNA              UNA 02-JAN-2015
DEFINITION  cDNA and genomic - ML129317a.
ACCESSION   Mle-Panxα4
VERSION     Mle-Panxα4
KEYWORDS    .
SOURCE      
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     N-Term          1..81
     TMD1            82..144
     ECL1            145..200
ORIGIN
        1 augguuauug agcugcuagc uggauacaaa ggucuguccc cguuuaaaga cgcgacuguu
       61 gacgacucau gggaccaaau aaaccgaugu uacguguuca ucgccauggu ggugaugggu
      121 gcugugacua caaugaggca auacucugga acauugauug caugugacgg guucacgaag
      181 uuccacccuc aguuugcaga
//
LOCUS       Mle-Panxα3               200 bp    RNA              UNA 02-JAN-2015
DEFINITION  cDNA - ML036514a.
ACCESSION   Mle-Panxα3
VERSION     Mle-Panxα3
KEYWORDS    .
SOURCE      
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             order(1..151,152..200)
                     /label="ML036514a"
     N-term          1..84
     TMD1            85..147
     ECL1            148..200
ORIGIN
        1 auguuguugc ucggcucacu cggaacgauc aagaacuuga gcaucuucaa agaccugucc
       61 uuggacgacu ggcuggauca gaugaacagg accuucaugu uucuacugcu cuguuucaug
      121 ggaacaauug ucgccguuag ucaguacacu gguaaaaaca uaucuugcga uggcuuuacg
      181 aaguucggag aagauuucuc
//

Input file 2: Mle-Panxα3.fa

>Mle-Panxα3 ML036514a.
MLLLGSLGTIKNLSIFKDLSLDDWLDQMNRTFMFLLLCFMGTIVAVSQYTGKNISCDGFT
KFGEDFSQDYCWTQGLYTIKEAYDLPESQIPYPGIIPENVPACREHALKNGGKIVCPPED
QVKPLTRARHLWYQWIPFYFWVIAPVFYLPYMFVKRMGLDRMKPLLKIMSDYYHCTTETP
SEEIIVKCADWVYNSIVDRLSEGSSWTSWRNRHGLGLAVLVSKFMYLGGSVLVMMMTTLM
FQVGDFKTYGIEWLRQFPNPENYSTSVKHKLFPKMVACEIKRWGTTGLEEENGMCVLAPN
VIYQYIFLIMWFALAITICTNFGNIFFYLFKLTATRYTYNKLVATGHFSHKHPGWKFMYY
RIGTSGRVLLNIVAQNTNPIIFGAIMEKLTPSVIKHLRIGHVPGEYLTDPA*

Usage

$: sb Mle-Panxα3.fa N-terminal_mrna.gb -fn2p

Output

Note that the size mismatch warning indicates that the amino acid sequence is 200 residues and the translated nucleotide sequence is 412 residues. This is expected in our example, but could reveal an issue with your data.

Warning: Mle-Panxα4 is in the cDNA file, but not in the protein file
Warning: size mismatch between aa and nucl seqs for Mle-Panxα3 --> 200, 412

LOCUS       Mle-Panxα3               412 aa                     UNK 01-JAN-1980
DEFINITION  Mle-Panxα3 ML036514a.
ACCESSION   Mle-Panxα3
VERSION     Mle-Panxα3
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             order(1..50,51..66)
                     /label="ML036514a"
     N-term          1..28
     TMD1            29..49
     ECL1            50..66
ORIGIN
        1 mlllgslgti knlsifkdls lddwldqmnr tfmflllcfm gtivavsqyt gkniscdgft
       61 kfgedfsqdy cwtqglytik eaydlpesqi pypgiipenv pacrehalkn ggkivcpped
      121 qvkpltrarh lwyqwipfyf wviapvfylp ymfvkrmgld rmkpllkims dyyhcttetp
      181 seeiivkcad wvynsivdrl segsswtswr nrhglglavl vskfmylggs vlvmmmttlm
      241 fqvgdfktyg iewlrqfpnp enystsvkhk lfpkmvacei krwgttglee engmcvlapn
      301 viyqyiflim wfalaitict nfgniffylf kltatrytyn klvatghfsh khpgwkfmyy
      361 rigtsgrvll nivaqntnpi ifgaimeklt psvikhlrig hvpgeyltdp a*
//

Main Toolkit Pages





Further Reading

Clone this wiki locally