From 9f06182e69e1e8cc36583c84124866bdfb985204 Mon Sep 17 00:00:00 2001 From: Pavankumar Videm Date: Thu, 14 Mar 2024 19:37:23 +0100 Subject: [PATCH 1/3] Alternative DESeq2 hands-on for ref-based RNA-seq tutorial --- .../tutorials/ref-based/tutorial.md | 64 +++++++++++++++---- 1 file changed, 53 insertions(+), 11 deletions(-) diff --git a/topics/transcriptomics/tutorials/ref-based/tutorial.md b/topics/transcriptomics/tutorials/ref-based/tutorial.md index 83df46fb367919..762f97b26f0afa 100644 --- a/topics/transcriptomics/tutorials/ref-based/tutorial.md +++ b/topics/transcriptomics/tutorials/ref-based/tutorial.md @@ -1163,10 +1163,6 @@ To be able to identify differential gene expression induced by PS depletion, all > {{ page.zenodo_link }}/files/GSM461182_untreat_single_featureCounts.counts > ``` > -> 3. Create a collection list with all these counts that you label `all counts`. Rename each item so it only has the GSM id, the treatment and the library, for example, `GSM461176_untreat_single`. -> -> {% snippet faqs/galaxy/collections_build_list.md %} -> {: .hands_on} You might think we can just compare the count values in the files directly and calculate the extent of differential gene expression. However, it is not that simple. @@ -1438,16 +1434,60 @@ Here, treatment is the primary factor that we are interested in. The sequencing > We recommend that you add all factors you think may affect gene expression in your experiment. It can be the sequencing type like here, but it can also be the manipulation (if different persons are involved in the library preparation), other batch effects, etc... {: .comment} +If you have only one or two factors with few number of biological replicates, basic setup of **DESeq2** is enough. In the case of complex experimental setup with large number of biological replicates, then tag-based collections are appropriate. Both the approaches give the same results. The tag-based approach requires a few additional steps before running the **DESeq2** tool but it will payoff when working with complex experimental setup. + +{% include _includes/cyoa-choices.html option1="Basic" option2="TagBased" default="Basic" %} + +
+ +We can now run **DESeq2**: + +> Determine differentially expressed features +> +> 1. {% tool [DESeq2](toolshed.g2.bx.psu.edu/repos/iuc/deseq2/deseq2/2.11.40.7+galaxy2) %} with the following parameters: +> - *"how"*: `Select datasets per level` +> - In *"Factor"*: +> - *"Specify a factor name, e.g. effects_drug_x or cancer_markers"*: `Treatment` +> - In *"1: Factor level"*: +> - *"Specify a factor level, typical values could be 'tumor', 'normal', 'treated' or 'control'"*: `treated` +> - In *"Count file(s)"*: `Select all the treated count files (GSM461179, GSM461180, GSM461181)` +> - In *"2: Factor level"*: +> - *"Specify a factor level, typical values could be 'tumor', 'normal', 'treated' or 'control'"*: `untreated` +> - In *"Count file(s)"*: `Select all the untreated count files (GSM461176, GSM461177, GSM461178, GSM461182)` +> - {% icon param-repeat %} *"Insert Factor"* +> - *"Specify a factor name, e.g. effects_drug_x or cancer_markers"*: `Sequencing` +> - In *"Factor level"*: +> - {% icon param-repeat %} *"Insert Factor level"* +> - *"Specify a factor level, typical values could be 'tumor', 'normal', 'treated' or 'control'"*: `PE` +> - In *"Count file(s)"*: `Select all the untreated count files (GSM461177, GSM461178, GSM461180, GSM461181)` +> - {% icon param-repeat %} *"Insert Factor level"* +> - *"Specify a factor level, typical values could be 'tumor', 'normal', 'treated' or 'control'"*: `SE` +> - In *"Count file(s)"*: `Select all the untreated count files (GSM461176, GSM461179, GSM461182)` +> - *"Files have header?"*: `Yes` +> - *"Choice of Input data"*: `Count data (e.g. from HTSeq-count, featureCounts or StringTie)` +> - In *"Output options"*: +> - *"Output selector"*: `Generate plots for visualizing the analysis results`, `Output normalised counts` +> +{: .hands_on} + +
+ +
+ DESeq2 requires to provide for each factor, counts of samples in each category. We will thus use tags on our collection of counts to easily select all samples belonging to the same category. For more information about alternative ways to set group tags, please see [this tutorial]({% link topics/galaxy-interface/tutorials/group-tags/tutorial.md %}). > Add tags to your collection for each of these factors > -> 1. {% tool [Extract element identifiers](toolshed.g2.bx.psu.edu/repos/iuc/collection_element_identifiers/collection_element_identifiers/0.0.2) %} with the following parameters: +> 1. Create a collection list with all these counts that you label `all counts`. Rename each item so it only has the GSM id, the treatment and the library, for example, `GSM461176_untreat_single`. +> +> {% snippet faqs/galaxy/collections_build_list.md %} +> +> 2. {% tool [Extract element identifiers](toolshed.g2.bx.psu.edu/repos/iuc/collection_element_identifiers/collection_element_identifiers/0.0.2) %} with the following parameters: > - {% icon param-collection %} *"Dataset collection"*: `all counts` > > We will now extract from the names the factors: > -> 2. {% tool [Replace Text in entire line](toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/1.1.2) %} +> 3. {% tool [Replace Text in entire line](toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/1.1.2) %} > - {% icon param-file %} *"File to process"*: output of **Extract element identifiers** {% icon tool %} > - In *"Replacement"*: > - In *"1: Replacement"* @@ -1456,25 +1496,25 @@ DESeq2 requires to provide for each factor, counts of samples in each category. > > This step creates 2 additional columns with the type of treatment and sequencing that can be used with the {% tool [Tag elements from file](__TAG_FROM_FILE__) %} tool > -> 3. Change the datatype to `tabular` +> 4. Change the datatype to `tabular` > > {% snippet faqs/galaxy/datasets_change_datatype.md datatype="tabular" %} > -> 4. {% tool [Tag elements](__TAG_FROM_FILE__) %} +> 5. {% tool [Tag elements](__TAG_FROM_FILE__) %} > - {% icon param-collection %} *"Input Collection"*: `all counts` > - {% icon param-file %} *"Tag collection elements according to this file"*: output of **Replace Text** {% icon tool %} > -> 5. Inspect the new collection +> 6. Inspect the new collection > > > You cannot see the changes? > > -> > You may not see it at first glance as the names are the same. However if you click on one and click on {% icon galaxy-tags %} **Edit dataset tags**, you should see 2 tags which start with 'group:'. This keyword will allow to use these tags in DESeq2. +> > You may not see it at first glance as the names are the same. However if you click on one and click on {% icon galaxy-tags %} **Edit dataset tags**, you should see 2 tags which start with 'group:'. This keyword will allow to use these tags in **DESeq2**. > > > {: .tip} > {: .hands_on} -We can now run DESeq2: +We can now run **DESeq2**: > Determine differentially expressed features > @@ -1507,6 +1547,8 @@ We can now run DESeq2: > {: .hands_on} +
+ **DESeq2** generated 3 outputs: - A table with the normalized counts for each gene (rows) in the samples (columns) From 8be6da213b2f934be1978db0e34fdb9981c1c196 Mon Sep 17 00:00:00 2001 From: Pavankumar Videm Date: Thu, 14 Mar 2024 19:57:42 +0100 Subject: [PATCH 2/3] typos --- topics/transcriptomics/tutorials/ref-based/tutorial.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/topics/transcriptomics/tutorials/ref-based/tutorial.md b/topics/transcriptomics/tutorials/ref-based/tutorial.md index 762f97b26f0afa..04a5e6a53707e9 100644 --- a/topics/transcriptomics/tutorials/ref-based/tutorial.md +++ b/topics/transcriptomics/tutorials/ref-based/tutorial.md @@ -1434,9 +1434,9 @@ Here, treatment is the primary factor that we are interested in. The sequencing > We recommend that you add all factors you think may affect gene expression in your experiment. It can be the sequencing type like here, but it can also be the manipulation (if different persons are involved in the library preparation), other batch effects, etc... {: .comment} -If you have only one or two factors with few number of biological replicates, basic setup of **DESeq2** is enough. In the case of complex experimental setup with large number of biological replicates, then tag-based collections are appropriate. Both the approaches give the same results. The tag-based approach requires a few additional steps before running the **DESeq2** tool but it will payoff when working with complex experimental setup. +If you have only one or two factors with few number of biological replicates, the basic setup of **DESeq2** is enough. In the case of a complex experimental setup with a large number of biological replicates, tag-based collections are appropriate. Both approaches give the same results. The Tag-based approach requires a few additional steps before running the **DESeq2** tool but it will payoff when working with a complex experimental setup. -{% include _includes/cyoa-choices.html option1="Basic" option2="TagBased" default="Basic" %} +{% include _includes/cyoa-choices.html option1="Basic" option2="Tag-based" default="Basic" %}
@@ -1472,7 +1472,7 @@ We can now run **DESeq2**:
-
+
DESeq2 requires to provide for each factor, counts of samples in each category. We will thus use tags on our collection of counts to easily select all samples belonging to the same category. For more information about alternative ways to set group tags, please see [this tutorial]({% link topics/galaxy-interface/tutorials/group-tags/tutorial.md %}). From 418b5176b2a8473f3b406926f09c6d8119d1699a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?B=C3=A9r=C3=A9nice=20Batut?= Date: Fri, 15 Mar 2024 14:03:52 +0100 Subject: [PATCH 3/3] Fix 2nd cyoa-choices --- topics/transcriptomics/tutorials/ref-based/tutorial.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/topics/transcriptomics/tutorials/ref-based/tutorial.md b/topics/transcriptomics/tutorials/ref-based/tutorial.md index 04a5e6a53707e9..6c282c540a528c 100644 --- a/topics/transcriptomics/tutorials/ref-based/tutorial.md +++ b/topics/transcriptomics/tutorials/ref-based/tutorial.md @@ -934,9 +934,8 @@ There are 4 ways to estimate strandness from **STAR** results (choose the one yo ## Counting reads per genes -In order to count the number of reads per gene, we offer a parallel tutorial for the 2 methods (STAR and featureCounts) which give very similar results. -{% include _includes/cyoa-choices.html option1="featureCounts" option2="STAR" default="featureCounts" %} +{% include _includes/cyoa-choices.html option1="featureCounts" option2="STAR" default="featureCounts" text="In order to count the number of reads per gene, we offer a parallel tutorial for the 2 methods (STAR and featureCounts) which give very similar results. Which methods would you prefer to use?" disambiguation="tool"%}
@@ -1436,7 +1435,7 @@ Here, treatment is the primary factor that we are interested in. The sequencing If you have only one or two factors with few number of biological replicates, the basic setup of **DESeq2** is enough. In the case of a complex experimental setup with a large number of biological replicates, tag-based collections are appropriate. Both approaches give the same results. The Tag-based approach requires a few additional steps before running the **DESeq2** tool but it will payoff when working with a complex experimental setup. -{% include _includes/cyoa-choices.html option1="Basic" option2="Tag-based" default="Basic" %} +{% include _includes/cyoa-choices.html option1="Basic" option2="Tag-based" default="Basic" text="Which approach would you prefer to use?" disambiguation="deseq"%}