@@ -1163,10 +1162,6 @@ To be able to identify differential gene expression induced by PS depletion, all
> {{ page.zenodo_link }}/files/GSM461182_untreat_single_featureCounts.counts
> ```
>
-> 3. Create a collection list with all these counts that you label `all counts`. Rename each item so it only has the GSM id, the treatment and the library, for example, `GSM461176_untreat_single`.
->
-> {% snippet faqs/galaxy/collections_build_list.md %}
->
{: .hands_on}
You might think we can just compare the count values in the files directly and calculate the extent of differential gene expression. However, it is not that simple.
@@ -1438,16 +1433,60 @@ Here, treatment is the primary factor that we are interested in. The sequencing
> We recommend that you add all factors you think may affect gene expression in your experiment. It can be the sequencing type like here, but it can also be the manipulation (if different persons are involved in the library preparation), other batch effects, etc...
{: .comment}
+If you have only one or two factors with few number of biological replicates, the basic setup of **DESeq2** is enough. In the case of a complex experimental setup with a large number of biological replicates, tag-based collections are appropriate. Both approaches give the same results. The Tag-based approach requires a few additional steps before running the **DESeq2** tool but it will payoff when working with a complex experimental setup.
+
+{% include _includes/cyoa-choices.html option1="Basic" option2="Tag-based" default="Basic" text="Which approach would you prefer to use?" disambiguation="deseq"%}
+
+
+
+We can now run **DESeq2**:
+
+> Determine differentially expressed features
+>
+> 1. {% tool [DESeq2](toolshed.g2.bx.psu.edu/repos/iuc/deseq2/deseq2/2.11.40.7+galaxy2) %} with the following parameters:
+> - *"how"*: `Select datasets per level`
+> - In *"Factor"*:
+> - *"Specify a factor name, e.g. effects_drug_x or cancer_markers"*: `Treatment`
+> - In *"1: Factor level"*:
+> - *"Specify a factor level, typical values could be 'tumor', 'normal', 'treated' or 'control'"*: `treated`
+> - In *"Count file(s)"*: `Select all the treated count files (GSM461179, GSM461180, GSM461181)`
+> - In *"2: Factor level"*:
+> - *"Specify a factor level, typical values could be 'tumor', 'normal', 'treated' or 'control'"*: `untreated`
+> - In *"Count file(s)"*: `Select all the untreated count files (GSM461176, GSM461177, GSM461178, GSM461182)`
+> - {% icon param-repeat %} *"Insert Factor"*
+> - *"Specify a factor name, e.g. effects_drug_x or cancer_markers"*: `Sequencing`
+> - In *"Factor level"*:
+> - {% icon param-repeat %} *"Insert Factor level"*
+> - *"Specify a factor level, typical values could be 'tumor', 'normal', 'treated' or 'control'"*: `PE`
+> - In *"Count file(s)"*: `Select all the paired-end count files (GSM461177, GSM461178, GSM461180, GSM461181)`
+> - {% icon param-repeat %} *"Insert Factor level"*
+> - *"Specify a factor level, typical values could be 'tumor', 'normal', 'treated' or 'control'"*: `SE`
+> - In *"Count file(s)"*: `Select all the single-end count files (GSM461176, GSM461179, GSM461182)`
+> - *"Files have header?"*: `Yes`
+> - *"Choice of Input data"*: `Count data (e.g. from HTSeq-count, featureCounts or StringTie)`
+> - In *"Output options"*:
+> - *"Output selector"*: `Generate plots for visualizing the analysis results`, `Output normalised counts`
+>
+{: .hands_on}
+
+
+
+
+
DESeq2 requires to provide for each factor, counts of samples in each category. We will thus use tags on our collection of counts to easily select all samples belonging to the same category. For more information about alternative ways to set group tags, please see [this tutorial]({% link topics/galaxy-interface/tutorials/group-tags/tutorial.md %}).
> Add tags to your collection for each of these factors
>
-> 1. {% tool [Extract element identifiers](toolshed.g2.bx.psu.edu/repos/iuc/collection_element_identifiers/collection_element_identifiers/0.0.2) %} with the following parameters:
+> 1. Create a collection list with all these counts that you label `all counts`. Rename each item so it only has the GSM id, the treatment and the library, for example, `GSM461176_untreat_single`.
+>
+> {% snippet faqs/galaxy/collections_build_list.md %}
+>
+> 2. {% tool [Extract element identifiers](toolshed.g2.bx.psu.edu/repos/iuc/collection_element_identifiers/collection_element_identifiers/0.0.2) %} with the following parameters:
> - {% icon param-collection %} *"Dataset collection"*: `all counts`
>
> We will now extract from the names the factors:
>
-> 2. {% tool [Replace Text in entire line](toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/1.1.2) %}
+> 3. {% tool [Replace Text in entire line](toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/1.1.2) %}
> - {% icon param-file %} *"File to process"*: output of **Extract element identifiers** {% icon tool %}
> - In *"Replacement"*:
> - In *"1: Replacement"*
@@ -1456,25 +1495,25 @@ DESeq2 requires to provide for each factor, counts of samples in each category.
>
> This step creates 2 additional columns with the type of treatment and sequencing that can be used with the {% tool [Tag elements from file](__TAG_FROM_FILE__) %} tool
>
-> 3. Change the datatype to `tabular`
+> 4. Change the datatype to `tabular`
>
> {% snippet faqs/galaxy/datasets_change_datatype.md datatype="tabular" %}
>
-> 4. {% tool [Tag elements](__TAG_FROM_FILE__) %}
+> 5. {% tool [Tag elements](__TAG_FROM_FILE__) %}
> - {% icon param-collection %} *"Input Collection"*: `all counts`
> - {% icon param-file %} *"Tag collection elements according to this file"*: output of **Replace Text** {% icon tool %}
>
-> 5. Inspect the new collection
+> 6. Inspect the new collection
>
> > You cannot see the changes?
> >
-> > You may not see it at first glance as the names are the same. However if you click on one and click on {% icon galaxy-tags %} **Edit dataset tags**, you should see 2 tags which start with 'group:'. This keyword will allow to use these tags in DESeq2.
+> > You may not see it at first glance as the names are the same. However if you click on one and click on {% icon galaxy-tags %} **Edit dataset tags**, you should see 2 tags which start with 'group:'. This keyword will allow to use these tags in **DESeq2**.
> >
> {: .tip}
>
{: .hands_on}
-We can now run DESeq2:
+We can now run **DESeq2**:
> Determine differentially expressed features
>
@@ -1507,6 +1546,8 @@ We can now run DESeq2:
>
{: .hands_on}
+
+
**DESeq2** generated 3 outputs:
- A table with the normalized counts for each gene (rows) in the samples (columns)
diff --git a/topics/variant-analysis/tutorials/aiv-analysis/tutorial.md b/topics/variant-analysis/tutorials/aiv-analysis/tutorial.md
index ffcc7acdd152a3..fed901c6f86f8f 100644
--- a/topics/variant-analysis/tutorials/aiv-analysis/tutorial.md
+++ b/topics/variant-analysis/tutorials/aiv-analysis/tutorial.md
@@ -419,32 +419,26 @@ Now what if we cannot obtain a consensus base for a position with the above crit
# Placing segments on a phylogenetic tree
The next logical step after obtaining the consensus sequences of segments of our sample is to explore how those sequences are related to the sequences in our reference collection.
-To do so, we are going to combine the reference sequences of all segments with their corresponding consensus sequence into one multi-sequence fasta dataset per segment. Then we build a multiple sequence alignment (MSA) from the sequences of each segment, and use these alignments to generate phylogenetic trees, again one per segment. We are going to use two rather standard tools, **MAFFT** and **IQTree**, for generating MSAs and trees, respectively.
+To do so, we are going to combine the reference sequences of all segments with their corresponding consensus sequence into one multiple sequence alignment (MSA) per segment, and use these to generate phylogenetic trees, again one per segment. We are going to use two rather standard tools, **MAFFT** and **IQTree**, for generating MSAs and trees, respectively.
>
Exploring phylogeny
>
-> 1. {% tool [Concatenate datasets](cat1) %}
-> - {% icon param-collection %} *"Concatenate Dataset"*: `References per segment (INSAFlu)`
-> - In *"Dataset"*:
-> - {% icon param-repeat %} *"Insert Dataset"*
-> - {% icon param-collection %} *"Select"*: collection of renamed consensus sequences; output of **Replace** on consensus sequences
+> 1. {% tool [MAFFT](toolshed.g2.bx.psu.edu/repos/rnateam/mafft/rbc_mafft/7.520+galaxy0) %}
+> - *"For multiple inputs generate"*: `one or several MSAs depending on input structure`
+> - In *"Input batch"*:
+> - {% icon param-repeat %} *"1: Input batch"*
+> - {% icon param-collection %} *"Sequences to align"*: collection of `References per segment (INSAFlu)`
+> - {% icon param-repeat %} *"2: Input batch"*
+> - {% icon param-collection %} *"Sequences to align"*: collection of renamed consensus sequences; output of **Replace** on consensus sequences
+> - *"Type of sequences"*: `Nucleic acids`
>
-> {% snippet faqs/galaxy/analysis_concatenate.md toolspec="#1" %}
+> Because both input batches are collections of eight elements each, the result is also a collection of eight MSAs, each aligning all reference sequences of one genome segment plus the consensus sequence we have obtained for that segment against each other.
>
-> The tool should produce a collection of eight multi-sequence fasta datasets, each of which has the generated consensus sequence for one segment concatenated to the INSAFlu reference sequences of that segment.
->
-> 2. {% tool [MAFFT](toolshed.g2.bx.psu.edu/repos/rnateam/mafft/rbc_mafft/7.508+galaxy0) %}
-> - {% icon param-collection %} *"Sequences to align"*: collection of concatenated sequences; output of **Concatenate datasets**
-> - *"Data type"*: `Nucleic Acids`
-> - *"Matrix selection"*: `No matrix`
->
-> The result is a collection of MSAs, each aligning all reference sequences of one genome segment plus the consensus sequence we have obtained for that segment against each other.
->
-> 3. {% tool [IQ-TREE](toolshed.g2.bx.psu.edu/repos/iuc/iqtree/iqtree/2.1.2+galaxy2) %}
+> 2. {% tool [IQ-TREE](toolshed.g2.bx.psu.edu/repos/iuc/iqtree/iqtree/2.1.2+galaxy2) %}
> - {% icon param-collection %} *"Specify input alignment file in PHYLIP, FASTA, NEXUS, CLUSTAL or MSF format."*: output of **MAFFT**
> - *"Specify sequence type ..."*: `DNA`
>
-> 4. {% icon galaxy-eye %} Explore each of the final trees produced by IQTree for the different segments
+> 3. {% icon galaxy-eye %} Explore each of the final trees produced by IQTree for the different segments
>
> >
> >
diff --git a/topics/variant-analysis/tutorials/microbial-variants/tutorial.md b/topics/variant-analysis/tutorials/microbial-variants/tutorial.md
index 60647239bd4c20..096534c4e23d9e 100644
--- a/topics/variant-analysis/tutorials/microbial-variants/tutorial.md
+++ b/topics/variant-analysis/tutorials/microbial-variants/tutorial.md
@@ -25,6 +25,13 @@ contributors:
- annasyme
- slugger70
- tseemann
+edam_ontology:
+- topic_0622 # Genomics
+- topic_0196 # Sequence assembly
+- topic_2885 # DNA polymorphism
+- topic_3301 # Microbiology
+- topic_0080 # Sequence analysis
+- topic_0199 # Genetic variation
---
diff --git a/topics/variant-analysis/tutorials/non-dip/tutorial.md b/topics/variant-analysis/tutorials/non-dip/tutorial.md
index 49a3a07f29ff53..316c6eb9da3d63 100644
--- a/topics/variant-analysis/tutorials/non-dip/tutorial.md
+++ b/topics/variant-analysis/tutorials/non-dip/tutorial.md
@@ -20,6 +20,13 @@ key_points:
contributors:
- nekrut
- astrovsky01
+edam_ontology:
+- topic_0622 # Genomics
+- topic_0196 # Sequence assembly
+- topic_2885 # DNA polymorphism
+- topic_3301 # Microbiology
+- topic_0080 # Sequence analysis
+- topic_0199 # Genetic variation
---
diff --git a/topics/variant-analysis/tutorials/pox-tiled-amplicon/tutorial.md b/topics/variant-analysis/tutorials/pox-tiled-amplicon/tutorial.md
index 5caf852e2ad16d..e043cee27d1645 100644
--- a/topics/variant-analysis/tutorials/pox-tiled-amplicon/tutorial.md
+++ b/topics/variant-analysis/tutorials/pox-tiled-amplicon/tutorial.md
@@ -805,19 +805,16 @@ This leaves us with the tasks of obtaining the sequence for parent P1 (accession
> 3. When the Replace Text tool run is finished, **rename** the output dataset
>
> {% snippet faqs/galaxy/datasets_rename.md name="Herbivac sequence" format="fasta" %}
-> 2. {% tool [Concatenate datasets tail-to-head (cat)](toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cat/0.1.1) %}
-> - {% icon param-files %} *"Datasets to concatenate"*: the `Herbivac sequence`; renamed output of **Replace**
-> - In *"Dataset"*:
-> - {% icon param-repeat %} *"Insert Dataset"*
-> - {% icon param-collection %} *"Select"*: collection of consensus sequences; output of **ivar consensus**
-> - {% icon param-repeat %} *"Insert Dataset"*
-> - {% icon param-files %} *"Select"*: the `LSDV reference`
->
-> {% snippet faqs/galaxy/analysis_concatenate.md toolspec="#2" %}
-> 3. {% tool [MAFFT](toolshed.g2.bx.psu.edu/repos/rnateam/mafft/rbc_mafft/7.508+galaxy0) %}
-> - {% icon param-file %} *"Sequences to align"*: Multi-fasta dataset with four sequences; output of **Concatenate**
-> - *"Data type"*: `Nucleic acids`
-> - *"Matrix selection"*: `No matrix`
+> 2. {% tool [MAFFT](toolshed.g2.bx.psu.edu/repos/rnateam/mafft/rbc_mafft/7.520+galaxy0) %}
+> - *"For multiple inputs generate"*: `a single MSA of all sequences from all inputs`
+> - In *"Input batch"*:
+> - {% icon param-repeat %} *"1: Input batch"*
+> - {% icon param-files %} *"Sequences to align"*: the `Herbivac sequence`; renamed output of **Replace**
+> - {% icon param-repeat %} *"2: Input batch"*
+> - {% icon param-collection %} *"Sequences to align"*: collection of consensus sequences; output of **ivar consensus**
+> - {% icon param-repeat %} *"3: Input batch"*
+> - {% icon param-files %} *"Sequences to align"*: the `LSDV reference`
+> - *"Type of sequences"*: `Nucleic acids`
>
{: .hands_on}
diff --git a/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.md b/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.md
index 2889ac58320c06..504b4f2198db10 100644
--- a/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.md
+++ b/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.md
@@ -24,6 +24,13 @@ tags:
- prokaryote
- one-health
- microgalaxy
+edam_ontology:
+- topic_0622 # Genomics
+- topic_3301 # Microbiology
+- topic_0196 # Sequence assembly
+- topic_0199 # Genetic variation
+- topic_3305 # Public health and epidemiology
+- topic_3324 # Infectious disease
---