Skip to content

Commit

Permalink
Merge pull request #1325 from nf-core/strand_param_fixes
Browse files Browse the repository at this point in the history
Minor fixes to strandedness settings and messaging
  • Loading branch information
drpatelh authored Jun 20, 2024
2 parents 83e18b6 + b98095c commit 6fc20bb
Show file tree
Hide file tree
Showing 5 changed files with 47 additions and 6 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
- [PR #1312](https://github.com/nf-core/rnaseq/pull/1312) - Fix issues with unzipping of GTF/ GFF files without absolute paths
- [PR #1322](https://github.com/nf-core/rnaseq/pull/1322) - Use pre-built Github Action to detect nf-test changes
- [PR #1306](https://github.com/nf-core/rnaseq/pull/1306) - Overhaul strandedness detection / comparison
- [PR #1325](https://github.com/nf-core/rnaseq/pull/1325) - Minor fixes to strandedness settings and messaging
- [PR #1326](https://github.com/nf-core/rnaseq/pull/1326) - Move Conda dependencies for local modules to individual environment file

### Parameters
Expand Down
Binary file modified docs/images/mqc_strand_check.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
45 changes: 43 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,50 @@ CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,a
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,auto
```

### Strandedness prediction
### Strandedness Prediction

If you set the strandedness value to `auto` the pipeline will sub-sample the input FastQ files to 1 million reads, use Salmon Quant to infer the strandedness automatically and then propagate this information to the remainder of the pipeline. If the strandedness has been inferred or provided incorrectly the sample will be flagged in the 'Strandedness Checks' section of the MultiQC report, so please be sure to check when looking at the QC for your samples.
If you set the strandedness value to `auto`, the pipeline will sub-sample the input FastQ files to 1 million reads, use Salmon Quant to automatically infer the strandedness, and then propagate this information through the rest of the pipeline. This behavior is controlled by the `--stranded_threshold` and `--unstranded_threshold` parameters, which are set to 0.8 and 0.1 by default, respectively. This means:

- **Forward stranded:** At least 80% of the fragments are in the 'forward' orientation.
- **Unstranded:** The forward and reverse fractions differ by less than 10%.
- **Undetermined:** Samples that do not meet either criterion, possibly indicating issues such as genomic DNA contamination.

**Note:** These thresholds apply to both the strandedness inferred from Salmon outputs for input to the pipeline and how strandedness is inferred from RSeQC results using pipeline outputs.

#### Usage Examples

1. **Forward Stranded Sample:**

- Forward fraction: 0.85
- Reverse fraction: 0.15
- **Classification:** Forward stranded

2. **Reverse Stranded Sample:**

- Forward fraction: 0.1
- Reverse fraction: 0.9
- **Classification:** Reverse stranded

3. **Unstranded Sample:**

- Forward fraction: 0.45
- Reverse fraction: 0.55
- **Classification:** Unstranded

4. **Undetermined Sample:**
- Forward fraction: 0.6
- Reverse fraction: 0.4
- **Classification:** Undetermined

You can control the stringency of this behavior with `--stranded_threshold` and `--unstranded_threshold`.

#### Errors and Reporting

The results of strandedness inference are displayed in the MultiQC report under 'Strandedness Checks'. This shows any provided strandedness and the results inferred by both Salmon (when strandedness is set to 'auto') and RSeQC. Mismatches between input strandedness (explicitly provided by the user or inferred by Salmon) and output strandedness from RSeQC are marked as fails. For example, if a user specifies 'forward' as strandedness for a library that is actually reverse stranded, this is marked as a fail.

![MultiQC - Strand check table](images/mqc_strand_check.png)

Be sure to check the strandedness report when reviewing the QC for your samples.

### Full samplesheet

Expand Down
5 changes: 2 additions & 3 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -458,15 +458,14 @@
"minimum": 0.5,
"maximum": 1.0,
"default": 0.8,
"description": "The fraction of stranded reads that must be assigned to a strandedness for confident assignment. Must be at least 0.5.",
"help_text": "To be assigned as unstranded, read fractions must be different by less than 1- this value. Samples not convincingly assigned as stranded or unstranded will be assigned as 'undetermined'. These will be treated as unstranded but flagged as potential problems in the final MultiQC report."
"description": "The fraction of stranded reads that must be assigned to a strandedness for confident assignment. Must be at least 0.5."
},
"unstranded_threshold": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0,
"default": 0.1,
"description": "The difference in fraction of stranded reads assigned to 'forward' and 'reverse' below which a sample is classified as 'unstranded'"
"description": "The difference in fraction of stranded reads assigned to 'forward' and 'reverse' below which a sample is classified as 'unstranded'. By default the forward and reverse fractions must differ by less than 0.1 for the sample to be called as unstranded."
}
}
},
Expand Down
2 changes: 1 addition & 1 deletion workflows/rnaseq/assets/multiqc/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ custom_data:
format: "{:.2f}"
fail_strand_check:
section_name: "Strandedness Checks"
description: "<p>The strandedness used for analysis in this workflow can be provided by a user, or automatically inferred by <b>Salmon</b> using a sample of reads. In either case, strandedness is verified at the end of the workflow, using the <b>RSeQC</b> <a href='http://rseqc.sourceforge.net/#infer-experiment-py'>RSeQC infer_experiment.py</a> operating on genomic alignments. In this table a pass indicates a match between suplied (or inferred by Salmon) and RSeQC, fail indicates a mismatch or 'undetermined' strandedness. Undetermined strandedness can indicate QC problems, including possible genomic DNA contamination.</p><p><b>Note:</b>Rows are duplicated for an 'auto' setting to allow comparison of statistics between inference methods.</p>"
description: "<p>The strandedness used for analysis in this workflow can either be provided by the user or automatically inferred by <b>Salmon</b> using a sample of reads. In both cases, strandedness is verified at the end of the workflow using <b>RSeQC</b>'s <a href='http://rseqc.sourceforge.net/#infer-experiment-py'>infer_experiment.py</a> on genomic alignments. In this table, a pass indicates a match between the supplied strandedness (or that inferred by Salmon) and RSeQC results. A fail indicates a mismatch or 'undetermined' strandedness. 'Undetermined' strandedness can signal QC issues, including potential genomic DNA contamination.</p><p><b>Note:</b> Rows are duplicated for an 'auto' setting to allow comparison of statistics between inference methods.</p>"
plot_type: "table"
pconfig:
id: "fail_strand_check_table"
Expand Down

0 comments on commit 6fc20bb

Please sign in to comment.