From 0a1d05ce71ede81395b8921aa143b6465eee16e9 Mon Sep 17 00:00:00 2001 From: Wolfgang Maier Date: Wed, 20 Mar 2024 16:09:06 +0100 Subject: [PATCH] Simplify MSA generation steps by using latest MAFFT version --- .../tutorials/aiv-analysis/tutorial.md | 30 ++++++++----------- .../tutorials/pox-tiled-amplicon/tutorial.md | 23 +++++++------- 2 files changed, 22 insertions(+), 31 deletions(-) diff --git a/topics/variant-analysis/tutorials/aiv-analysis/tutorial.md b/topics/variant-analysis/tutorials/aiv-analysis/tutorial.md index ffcc7acdd152a3..fed901c6f86f8f 100644 --- a/topics/variant-analysis/tutorials/aiv-analysis/tutorial.md +++ b/topics/variant-analysis/tutorials/aiv-analysis/tutorial.md @@ -419,32 +419,26 @@ Now what if we cannot obtain a consensus base for a position with the above crit # Placing segments on a phylogenetic tree The next logical step after obtaining the consensus sequences of segments of our sample is to explore how those sequences are related to the sequences in our reference collection. -To do so, we are going to combine the reference sequences of all segments with their corresponding consensus sequence into one multi-sequence fasta dataset per segment. Then we build a multiple sequence alignment (MSA) from the sequences of each segment, and use these alignments to generate phylogenetic trees, again one per segment. We are going to use two rather standard tools, **MAFFT** and **IQTree**, for generating MSAs and trees, respectively. +To do so, we are going to combine the reference sequences of all segments with their corresponding consensus sequence into one multiple sequence alignment (MSA) per segment, and use these to generate phylogenetic trees, again one per segment. We are going to use two rather standard tools, **MAFFT** and **IQTree**, for generating MSAs and trees, respectively. > Exploring phylogeny > -> 1. {% tool [Concatenate datasets](cat1) %} -> - {% icon param-collection %} *"Concatenate Dataset"*: `References per segment (INSAFlu)` -> - In *"Dataset"*: -> - {% icon param-repeat %} *"Insert Dataset"* -> - {% icon param-collection %} *"Select"*: collection of renamed consensus sequences; output of **Replace** on consensus sequences +> 1. {% tool [MAFFT](toolshed.g2.bx.psu.edu/repos/rnateam/mafft/rbc_mafft/7.520+galaxy0) %} +> - *"For multiple inputs generate"*: `one or several MSAs depending on input structure` +> - In *"Input batch"*: +> - {% icon param-repeat %} *"1: Input batch"* +> - {% icon param-collection %} *"Sequences to align"*: collection of `References per segment (INSAFlu)` +> - {% icon param-repeat %} *"2: Input batch"* +> - {% icon param-collection %} *"Sequences to align"*: collection of renamed consensus sequences; output of **Replace** on consensus sequences +> - *"Type of sequences"*: `Nucleic acids` > -> {% snippet faqs/galaxy/analysis_concatenate.md toolspec="#1" %} +> Because both input batches are collections of eight elements each, the result is also a collection of eight MSAs, each aligning all reference sequences of one genome segment plus the consensus sequence we have obtained for that segment against each other. > -> The tool should produce a collection of eight multi-sequence fasta datasets, each of which has the generated consensus sequence for one segment concatenated to the INSAFlu reference sequences of that segment. -> -> 2. {% tool [MAFFT](toolshed.g2.bx.psu.edu/repos/rnateam/mafft/rbc_mafft/7.508+galaxy0) %} -> - {% icon param-collection %} *"Sequences to align"*: collection of concatenated sequences; output of **Concatenate datasets** -> - *"Data type"*: `Nucleic Acids` -> - *"Matrix selection"*: `No matrix` -> -> The result is a collection of MSAs, each aligning all reference sequences of one genome segment plus the consensus sequence we have obtained for that segment against each other. -> -> 3. {% tool [IQ-TREE](toolshed.g2.bx.psu.edu/repos/iuc/iqtree/iqtree/2.1.2+galaxy2) %} +> 2. {% tool [IQ-TREE](toolshed.g2.bx.psu.edu/repos/iuc/iqtree/iqtree/2.1.2+galaxy2) %} > - {% icon param-collection %} *"Specify input alignment file in PHYLIP, FASTA, NEXUS, CLUSTAL or MSF format."*: output of **MAFFT** > - *"Specify sequence type ..."*: `DNA` > -> 4. {% icon galaxy-eye %} Explore each of the final trees produced by IQTree for the different segments +> 3. {% icon galaxy-eye %} Explore each of the final trees produced by IQTree for the different segments > > > > > diff --git a/topics/variant-analysis/tutorials/pox-tiled-amplicon/tutorial.md b/topics/variant-analysis/tutorials/pox-tiled-amplicon/tutorial.md index 5caf852e2ad16d..e043cee27d1645 100644 --- a/topics/variant-analysis/tutorials/pox-tiled-amplicon/tutorial.md +++ b/topics/variant-analysis/tutorials/pox-tiled-amplicon/tutorial.md @@ -805,19 +805,16 @@ This leaves us with the tasks of obtaining the sequence for parent P1 (accession > 3. When the Replace Text tool run is finished, **rename** the output dataset > > {% snippet faqs/galaxy/datasets_rename.md name="Herbivac sequence" format="fasta" %} -> 2. {% tool [Concatenate datasets tail-to-head (cat)](toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cat/0.1.1) %} -> - {% icon param-files %} *"Datasets to concatenate"*: the `Herbivac sequence`; renamed output of **Replace** -> - In *"Dataset"*: -> - {% icon param-repeat %} *"Insert Dataset"* -> - {% icon param-collection %} *"Select"*: collection of consensus sequences; output of **ivar consensus** -> - {% icon param-repeat %} *"Insert Dataset"* -> - {% icon param-files %} *"Select"*: the `LSDV reference` -> -> {% snippet faqs/galaxy/analysis_concatenate.md toolspec="#2" %} -> 3. {% tool [MAFFT](toolshed.g2.bx.psu.edu/repos/rnateam/mafft/rbc_mafft/7.508+galaxy0) %} -> - {% icon param-file %} *"Sequences to align"*: Multi-fasta dataset with four sequences; output of **Concatenate** -> - *"Data type"*: `Nucleic acids` -> - *"Matrix selection"*: `No matrix` +> 2. {% tool [MAFFT](toolshed.g2.bx.psu.edu/repos/rnateam/mafft/rbc_mafft/7.520+galaxy0) %} +> - *"For multiple inputs generate"*: `a single MSA of all sequences from all inputs` +> - In *"Input batch"*: +> - {% icon param-repeat %} *"1: Input batch"* +> - {% icon param-files %} *"Sequences to align"*: the `Herbivac sequence`; renamed output of **Replace** +> - {% icon param-repeat %} *"2: Input batch"* +> - {% icon param-collection %} *"Sequences to align"*: collection of consensus sequences; output of **ivar consensus** +> - {% icon param-repeat %} *"3: Input batch"* +> - {% icon param-files %} *"Sequences to align"*: the `LSDV reference` +> - *"Type of sequences"*: `Nucleic acids` > {: .hands_on}