Merge pull request #164 from drpatelh/master

TLC for MultiQC and add a couple of requested params
nf-core · Jul 1, 2020 · 611a3ee · 611a3ee
2 parents 7836e02 + 2aa8847
commit 611a3ee
Show file tree

Hide file tree

Showing 15 changed files with 145 additions and 73 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,16 +9,23 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
 
 * [#138](https://github.com/nf-core/chipseq/issues/138) - Add social preview image
 * [#153](https://github.com/nf-core/chipseq/issues/153) - Add plotHeatmap
+* [#159](https://github.com/nf-core/chipseq/issues/159) - expose bwa mem -T parameter
 * [nf-core/atacseq#63](https://github.com/nf-core/atacseq/issues/63) - Added multicore support for Trim Galore!
 * [nf-core/atacseq#71](https://github.com/nf-core/atacseq/issues/71) - consensus_peaks.mLb.clN.boolean.intersect.plot.pdf not generated
 * [nf-core/atacseq#75](https://github.com/nf-core/atacseq/issues/75) - Include gene annotation versions in multiqc report
 * [nf-core/atacseq#76](https://github.com/nf-core/atacseq/issues/76) - featureCounts coupled to DESeq2
 * [nf-core/atacseq#79](https://github.com/nf-core/atacseq/issues/79) - Parallelize DESeq2
 * [nf-core/atacseq#97](https://github.com/nf-core/atacseq/issues/97) - PBC1, PBC2 from pipeline?
+* [nf-core/atacseq#107](https://github.com/nf-core/atacseq/issues/107) - Add options to change MACS2 parameters
+* [nf-core/atacseq#109](https://github.com/nf-core/atacseq/issues/109) - Specify custom gtf but gene bed is not generated from that gtf?
+* Regenerated screenshots and added collapsible sections for output files in `docs/output.md`
 * Update template to tools `1.9`
 * Replace `set` with `tuple` and `file()` with `path()` in all processes
 * Capitalise process names
 * Parameters:
+    * `--bwa_min_score` to set minimum alignment score for BWA MEM
+    * `--macs_fdr` to provide FDR threshold for MACS2 peak calling
+    * `--macs_pvalue` to provide p-value threshold for MACS2 peak calling
     * `--skip_peak_qc` to skip MACS2 peak QC plot generation
     * `--skip_peak_annotation` to skip annotation of MACS2 and consensus peaks with HOMER
     * `--skip_consensus_peaks` to skip consensus peak generation

diff --git a/assets/multiqc/deseq2_clustering_header.txt b/assets/multiqc/deseq2_clustering_header.txt
@@ -1,11 +1,11 @@
 #id: 'deseq2_clustering'
-#section_name: 'DESeq2: Sample similarity'
+#section_name: 'MERGED LIB: DESeq2 sample similarity'
 #description: " matrix is generated from clustering by Euclidean distances between
 #	       <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html' target='_blank'>DESeq2</a>
 #              rlog values for each sample
 #              (see <a href='https://github.com/nf-core/chipseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script)."
 #plot_type: 'heatmap'
-#anchor: 'nfcore_chipseq-deseq2_clustering'
+#anchor: 'deseq2_clustering'
 #pconfig:
 #    title: 'DESeq2: Heatmap of the sample-to-sample distances'
 #    xlab: True

diff --git a/assets/multiqc/deseq2_pca_header.txt b/assets/multiqc/deseq2_pca_header.txt
@@ -1,10 +1,10 @@
 #id: 'deseq2_pca'
-#section_name: 'DESeq2: PCA plot'
+#section_name: 'MERGED LIB: DESeq2 PCA plot'
 #description: "between samples in the experiment.
 #              These values are calculated using <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html'>DESeq2</a>
 #              in the <a href='https://github.com/nf-core/chipseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script."
 #plot_type: 'scatter'
-#anchor: 'nfcore_chipseq-deseq2_pca'
+#anchor: 'deseq2_pca'
 #pconfig:
 #    title: 'DESeq2: Principal component plot'
 #    xlab: PC1

diff --git a/assets/multiqc/frip_score_header.txt b/assets/multiqc/frip_score_header.txt
@@ -1,10 +1,10 @@
 #id: 'frip_score'
-#section_name: 'MACS2: Peak FRiP score'
+#section_name: 'MERGED LIB: MACS2 FRiP score'
 #description: "is generated by calculating the fraction of all mapped reads that fall
 #              into the MACS2 called peak regions. A read must overlap a peak by at least 20% to be counted.
 #              See <a href='https://www.encodeproject.org/data-standards/terms/' target='_blank'>FRiP score</a>."
 #plot_type: 'bargraph'
-#anchor: 'nfcore_chipseq-frip_score'
+#anchor: 'frip_score'
 #pconfig:
 #    title: 'FRiP score'
 #    ylab: 'FRiP score'

diff --git a/assets/multiqc/peak_annotation_header.txt b/assets/multiqc/peak_annotation_header.txt
@@ -1,9 +1,9 @@
 #id: 'peak_annotation'
-#section_name: 'HOMER: Peak annotation'
+#section_name: 'MERGED LIB: HOMER peak annotation'
 #description: "is generated by calculating the proportion of peaks assigned to genomic features by
 #              <a href='http://homer.ucsd.edu/homer/ngs/annotation.html' target='_blank'>HOMER annotatePeaks.pl</a>."
 #plot_type: 'bargraph'
-#anchor: 'nfcore_chipseq-peak_annotation'
+#anchor: 'peak_annotation'
 #pconfig:
 #    title: 'Peak to feature proportion'
 #    ylab: 'Peak count'
diff --git a/assets/multiqc/peak_count_header.txt b/assets/multiqc/peak_count_header.txt
@@ -1,9 +1,9 @@
 #id: 'peak_count'
-#section_name: 'MACS2: Peak count'
+#section_name: 'MERGED LIB: MACS2 peak count'
 #description: "is calculated from total number of peaks called by
 #	       <a href='https://github.com/taoliu/MACS' target='_blank'>MACS2</a>"
 #plot_type: 'bargraph'
-#anchor: 'nfcore_chipseq-peak_count'
+#anchor: 'peak_count'
 #pconfig:
 #    title: 'Total peak count'
 #    ylab: 'Peak count'
diff --git a/assets/multiqc/spp_correlation_header.txt b/assets/multiqc/spp_correlation_header.txt
@@ -1,9 +1,9 @@
 #id: 'strand_shift_correlation'
-#section_name: 'spp: Strand-shift correlation plot'
+#section_name: 'MERGED LIB: spp strand-shift correlation'
 #description: "generated using run_spp.R script from
 #              <a href='https://github.com/kundajelab/phantompeakqualtools' target='_blank'>phantompeakqualtools</a>."
 #plot_type: 'linegraph'
-#anchor: 'nfcore_chipseq-strand_shift_correlation'
+#anchor: 'strand_shift_correlation'
 #pconfig:
 #    title: 'Strand-shift correlation plot'
 #    ylab: 'Cross-correlation'

diff --git a/assets/multiqc/spp_nsc_header.txt b/assets/multiqc/spp_nsc_header.txt
@@ -1,9 +1,9 @@
 #id: 'nsc_coefficient'
-#section_name: 'spp: NSC coefficient'
+#section_name: 'MERGED LIB: spp NSC coefficient'
 #description: "generated using run_spp.R script from
 #              <a href='https://github.com/kundajelab/phantompeakqualtools' target='_blank'>phantompeakqualtools</a>."
 #plot_type: 'bargraph'
-#anchor: 'nfcore_chipseq-nsc_coefficient'
+#anchor: 'nsc_coefficient'
 #pconfig:
 #    title: 'Normalized strand cross-correlation coefficient'
 #    ylab: 'NSC coefficient'

diff --git a/assets/multiqc/spp_rsc_header.txt b/assets/multiqc/spp_rsc_header.txt
@@ -1,9 +1,9 @@
 #id: 'rsc_coefficient'
-#section_name: 'spp: RSC coefficient'
+#section_name: 'MERGED LIB: spp RSC coefficient'
 #description: "generated using run_spp.R script from
 #              <a href='https://github.com/kundajelab/phantompeakqualtools' target='_blank'>phantompeakqualtools</a>."
 #plot_type: 'bargraph'
-#anchor: 'nfcore_chipseq-rsc_coefficient'
+#anchor: 'rsc_coefficient'
 #pconfig:
 #    title: 'Relative strand cross-correlation coefficient'
 #    ylab: 'RSC coefficient'

diff --git a/assets/multiqc_config.yaml b/assets/multiqc_config.yaml
@@ -21,73 +21,67 @@ exclude_modules:
 
 module_order:
     - fastqc:
-        name: 'FastQC (library; raw)'
-        info: 'This section of the report shows FastQC results before adapter trimming.'
+        name: 'LIB: FastQC (raw)'
+        info: 'This section of the report shows FastQC results before adapter trimming for individual libraries.'
         path_filters:
-            - '*_fastqc.zip'
-        path_filters_exclude:
-            - '*val*_fastqc.zip'
-            - '*trimmed_fastqc.zip'
+            - './fastqc/*.zip'
     - cutadapt:
-        name: 'cutadapt (library; trimmed)'
-        info: 'This section of the report shows the length of trimmed reads by cutadapt.'
-        path_filters:
-            - '*_trimming_report.txt'
+        name: 'LIB: cutadapt (trimmed)'
+        info: 'This section of the report shows the length of trimmed reads by cutadapt for individual libraries.'
     - fastqc:
-        name: 'FastQC (library; trimmed)'
-        info: 'This section of the report shows FastQC results after adapter trimming.'
+        name: 'LIB: FastQC (trimmed)'
+        info: 'This section of the report shows FastQC results after adapter trimming for individual libraries.'
         path_filters:
-            - '*val*_fastqc.zip'
-            - '*trimmed_fastqc.zip'
+            - './trimgalore/fastqc/*.zip'
     - samtools:
-        name: 'SAMTools (library)'
+        name: 'LIB: SAMTools'
         info: 'This section of the report shows SAMTools results for individual libraries.'
         path_filters:
-            - '*.Lb.sorted.bam*'
+            - './alignment/library/*'
     - samtools:
-        name: 'SAMTools (merged library; unfiltered)'
+        name: 'MERGED LIB: SAMTools (unfiltered)'
         info: 'This section of the report shows SAMTools results after merging libraries and before filtering.'
         path_filters:
-            - '*mLb.mkD.sorted.bam*'
+            - './alignment/mergedLibrary/*.mLb.mkD.sorted.bam*'
     - preseq:
-        name: 'Preseq (merged library; unfiltered)'
+        name: 'MERGED LIB: Preseq (unfiltered)'
         info: 'This section of the report shows Preseq results after merging libraries and before filtering.'
-        path_filters:
-            - '*ccurve.txt'
     - samtools:
-        name: 'SAMTools (merged library; filtered)'
+        name: 'MERGED LIB: SAMTools (filtered)'
         info: 'This section of the report shows SAMTools results after merging libraries and after filtering.'
         path_filters:
-            - '*mLb.clN.sorted.bam*'
+            - './alignment/mergedLibrary/*.mLb.clN.sorted.bam*'
     - picard:
-        name: 'Picard (merged library; filtered)'
+        name: 'MERGED LIB: Picard'
         info: 'This section of the report shows picard results after merging libraries and after filtering.'
         path_filters:
-            - '*mLb*'
+            - './alignment/mergedLibrary/picard_metrics/*'
     - deeptools:
-        name: 'deepTools'
+        name: 'MERGED LIB: deepTools'
+        anchor: 'mlib_deeptools'
         info: 'This section of the report shows ChIP-seq QC plots generated by deepTools.'
-        path_filters:
-            - '*.plot*'
     - featureCounts:
-        name: 'featureCounts'
+        name: 'MERGED LIB: featureCounts'
+        anchor: 'mlib_featurecounts'
         info: 'This section of the report shows featureCounts results for the number of reads assigned to merged library consensus peaks.'
         path_filters:
-            - '*featureCounts*'
+            - './macs/consensus/*.summary'
 
 report_section_order:
     peak_count:
-        order: -1000
+        before: mlib_deeptools
     frip_score:
-        order: -1100
+        before: peak_count
+    peak_annotation:
+        before: frip_score
     strand_shift_correlation:
-        order: -1200
+        before: peak_annotation
     nsc_coefficient:
-        order: -1300
+        before: strand_shift_correlation
     rsc_coefficient:
-        order: -1400
-    peak_annotation:
-        order: -1500
+        before: nsc_coefficient
+    mlib_featurecounts:
+        before: rsc_coefficient
     deseq2_pca_1:
         order: -1600
     deseq2_pca_2:
@@ -153,3 +147,18 @@ extra_fn_clean_exts:
     - '_spp'
     - '.spp'
     - 'ccurve'
+
+# # Customise the module search patterns to speed up execution time
+# #  - Skip module sub-tools that we are not interested in
+# #  - Replace file-content searching with filename pattern searching
+# #  - Don't add anything that is the same as the MultiQC default
+# # See https://multiqc.info/docs/#optimise-file-search-patterns for details
+sp:
+    cutadapt:
+        fn: '*trimming_report.txt'
+    preseq:
+        fn: '*.ccurve.txt'
+    deeptools/plotFingerprintOutRawCounts:
+        fn: '*plotFingerprint*'
+    deeptools/plotProfile:
+        fn: '*plotProfile*'
diff --git a/conf/test_full.config b/conf/test_full.config
@@ -13,7 +13,8 @@ params {
 
   // Input data
   input = 'https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/design_full.csv'
+  single_end = true
 
   // Genome references
-  genome = 'hg38'
+  genome = 'hg19'
 }
diff --git a/docs/usage.md b/docs/usage.md
@@ -29,12 +29,15 @@
     * [`--skip_trimming`](#--skip_trimming)
     * [`--save_trimmed`](#--save_trimmed)
 * [Alignments](#alignments)
+    * [`--bwa_min_score`](#--bwa_min_score)
     * [`--keep_dups`](#--keep_dups)
     * [`--keep_multi_map`](#--keep_multi_map)
     * [`--save_align_intermeds`](#--save_align_intermeds)
 * [Peaks](#peaks)
     * [`--narrow_peak`](#--narrow_peak)
     * [`--broad_cutoff`](#--broad_cutoff)
+    * [`--macs_fdr`](#--macs_fdr)
+    * [`--macs_pvalue`](#--macs_pvalue)
     * [`--min_reps_consensus`](#--min_reps_consensus)
     * [`--save_macs_pileup`](#--save_macs_pileup)
     * [`--skip_peak_qc`](#--skip_peak_qc)
@@ -335,11 +338,11 @@ If provided, alignments that overlap with the regions in this file will be filte
 
 ### `--save_reference`
 
-If the BWA index is generated by the pipeline use this parameter to save it to your results folder. These can then be used for future pipeline runs, reducing processing times.
+If the BWA index is generated by the pipeline use this parameter to save it to your results folder. These can then be used for future pipeline runs, reducing processing times (Default: false).
 
 ### `--igenomes_ignore`
 
-Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`.
+Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config` (Default: false).
 
 ## Adapter trimming
 
@@ -359,69 +362,81 @@ You can specify custom trimming parameters as follows:
 
 ### `--skip_trimming`
 
-Skip the adapter trimming step. Use this if your input FastQ files have already been trimmed outside of the workflow or if you're very confident that there is no adapter contamination in your data.
+Skip the adapter trimming step. Use this if your input FastQ files have already been trimmed outside of the workflow or if you're very confident that there is no adapter contamination in your data (Default: false).
 
 ### `--save_trimmed`
 
-By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete.
+By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete (Default: false).
 
 ## Alignments
 
+### `--bwa_min_score`
+
+Don’t output BWA MEM alignments with score lower than this parameter (Default: false).
+
 ### `--keep_dups`
 
-Duplicate reads are not filtered from alignments.
+Duplicate reads are not filtered from alignments (Default: false).
 
 ### `--keep_multi_map`
 
-Reads mapping to multiple locations in the genome are not filtered from alignments.
+Reads mapping to multiple locations in the genome are not filtered from alignments (Default: false).
 
 ### `--save_align_intermeds`
 
-By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set to true to also save other intermediate BAM files.
+By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set to true to also save other intermediate BAM files (Default: false).
 
 ## Peaks
 
 ### `--narrow_peak`
 
-MACS2 is run by default with the [`--broad`](https://github.com/taoliu/MACS#--broad) flag. Specify this flag to call peaks in narrowPeak mode.
+MACS2 is run by default with the [`--broad`](https://github.com/taoliu/MACS#--broad) flag. Specify this flag to call peaks in narrowPeak mode (Default: false).
 
 ### `--broad_cutoff`
 
 Specifies broad cut-off value for MACS2. Only used when `--narrow_peak` isnt specified (Default: `0.1`).
 
+### `--macs_fdr`
+
+Minimum FDR (q-value) cutoff for peak detection, `--macs_fdr` and `--macs_pvalue` are mutually exclusive (Default: false).
+
+### `--macs_pvalue`
+
+p-value cutoff for peak detection, `--macs_fdr` and `--macs_pvalue` are mutually exclusive (Default: false). If `--macs_pvalue` cutoff is set, q-value will not be calculated and reported as -1 in the final .xls file.
+
 ### `--min_reps_consensus`
 
-Number of biological replicates required from a given condition for a peak to contribute to a consensus peak . If you are confident you have good reproducibility amongst your replicates then you can increase the value of this parameter to create a "reproducible" set of consensus of peaks. For example, a value of 2 will mean peaks that have been called in at least 2 replicates will contribute to the consensus set of peaks, and as such peaks that are unique to a given replicate will be discarded.
+Number of biological replicates required from a given condition for a peak to contribute to a consensus peak . If you are confident you have good reproducibility amongst your replicates then you can increase the value of this parameter to create a "reproducible" set of consensus of peaks. For example, a value of 2 will mean peaks that have been called in at least 2 replicates will contribute to the consensus set of peaks, and as such peaks that are unique to a given replicate will be discarded (Default: 1).
 
 ```bash
 -- min_reps_consensus 1
 ```
 
 ### `--save_macs_pileup`
 
-Instruct MACS2 to create bedGraph files using the `-B --SPMR` parameters.
+Instruct MACS2 to create bedGraph files using the `-B --SPMR` parameters (Default: false).
 
 ### `--skip_peak_qc`
 
-Skip MACS2 peak QC plot generation.
+Skip MACS2 peak QC plot generation (Default: false).
 
 ### `--skip_peak_annotation`
 
-Skip annotation of MACS2 and consensus peaks with HOMER.
+Skip annotation of MACS2 and consensus peaks with HOMER (Default: false).
 
 ### `--skip_consensus_peaks`
 
-Skip consensus peak generation, annotation and counting.
+Skip consensus peak generation, annotation and counting (Default: false).
 
 ## Differential analysis
 
 ### `--deseq2_vst`
 
-Use `vst` transformation instead of `rlog` with DESeq2. See [DESeq2 docs](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization).
+Use `vst` transformation instead of `rlog` with DESeq2. See [DESeq2 docs](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization) (Default: false).
 
 ### `--skip_diff_analysis`
 
-Skip differential binding analysis with DESeq2.
+Skip differential binding analysis with DESeq2 (Default: false).
 
 ## Skipping QC steps