Skip to content

Commit

Permalink
Merge pull request #164 from drpatelh/master
Browse files Browse the repository at this point in the history
TLC for MultiQC and add a couple of requested params
  • Loading branch information
drpatelh authored Jul 1, 2020
2 parents 7836e02 + 2aa8847 commit 611a3ee
Show file tree
Hide file tree
Showing 15 changed files with 145 additions and 73 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,23 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

* [#138](https://github.com/nf-core/chipseq/issues/138) - Add social preview image
* [#153](https://github.com/nf-core/chipseq/issues/153) - Add plotHeatmap
* [#159](https://github.com/nf-core/chipseq/issues/159) - expose bwa mem -T parameter
* [nf-core/atacseq#63](https://github.com/nf-core/atacseq/issues/63) - Added multicore support for Trim Galore!
* [nf-core/atacseq#71](https://github.com/nf-core/atacseq/issues/71) - consensus_peaks.mLb.clN.boolean.intersect.plot.pdf not generated
* [nf-core/atacseq#75](https://github.com/nf-core/atacseq/issues/75) - Include gene annotation versions in multiqc report
* [nf-core/atacseq#76](https://github.com/nf-core/atacseq/issues/76) - featureCounts coupled to DESeq2
* [nf-core/atacseq#79](https://github.com/nf-core/atacseq/issues/79) - Parallelize DESeq2
* [nf-core/atacseq#97](https://github.com/nf-core/atacseq/issues/97) - PBC1, PBC2 from pipeline?
* [nf-core/atacseq#107](https://github.com/nf-core/atacseq/issues/107) - Add options to change MACS2 parameters
* [nf-core/atacseq#109](https://github.com/nf-core/atacseq/issues/109) - Specify custom gtf but gene bed is not generated from that gtf?
* Regenerated screenshots and added collapsible sections for output files in `docs/output.md`
* Update template to tools `1.9`
* Replace `set` with `tuple` and `file()` with `path()` in all processes
* Capitalise process names
* Parameters:
* `--bwa_min_score` to set minimum alignment score for BWA MEM
* `--macs_fdr` to provide FDR threshold for MACS2 peak calling
* `--macs_pvalue` to provide p-value threshold for MACS2 peak calling
* `--skip_peak_qc` to skip MACS2 peak QC plot generation
* `--skip_peak_annotation` to skip annotation of MACS2 and consensus peaks with HOMER
* `--skip_consensus_peaks` to skip consensus peak generation
Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc/deseq2_clustering_header.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#id: 'deseq2_clustering'
#section_name: 'DESeq2: Sample similarity'
#section_name: 'MERGED LIB: DESeq2 sample similarity'
#description: " matrix is generated from clustering by Euclidean distances between
# <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html' target='_blank'>DESeq2</a>
# rlog values for each sample
# (see <a href='https://github.com/nf-core/chipseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script)."
#plot_type: 'heatmap'
#anchor: 'nfcore_chipseq-deseq2_clustering'
#anchor: 'deseq2_clustering'
#pconfig:
# title: 'DESeq2: Heatmap of the sample-to-sample distances'
# xlab: True
Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc/deseq2_pca_header.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#id: 'deseq2_pca'
#section_name: 'DESeq2: PCA plot'
#section_name: 'MERGED LIB: DESeq2 PCA plot'
#description: "between samples in the experiment.
# These values are calculated using <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html'>DESeq2</a>
# in the <a href='https://github.com/nf-core/chipseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script."
#plot_type: 'scatter'
#anchor: 'nfcore_chipseq-deseq2_pca'
#anchor: 'deseq2_pca'
#pconfig:
# title: 'DESeq2: Principal component plot'
# xlab: PC1
Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc/frip_score_header.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#id: 'frip_score'
#section_name: 'MACS2: Peak FRiP score'
#section_name: 'MERGED LIB: MACS2 FRiP score'
#description: "is generated by calculating the fraction of all mapped reads that fall
# into the MACS2 called peak regions. A read must overlap a peak by at least 20% to be counted.
# See <a href='https://www.encodeproject.org/data-standards/terms/' target='_blank'>FRiP score</a>."
#plot_type: 'bargraph'
#anchor: 'nfcore_chipseq-frip_score'
#anchor: 'frip_score'
#pconfig:
# title: 'FRiP score'
# ylab: 'FRiP score'
Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc/peak_annotation_header.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#id: 'peak_annotation'
#section_name: 'HOMER: Peak annotation'
#section_name: 'MERGED LIB: HOMER peak annotation'
#description: "is generated by calculating the proportion of peaks assigned to genomic features by
# <a href='http://homer.ucsd.edu/homer/ngs/annotation.html' target='_blank'>HOMER annotatePeaks.pl</a>."
#plot_type: 'bargraph'
#anchor: 'nfcore_chipseq-peak_annotation'
#anchor: 'peak_annotation'
#pconfig:
# title: 'Peak to feature proportion'
# ylab: 'Peak count'
4 changes: 2 additions & 2 deletions assets/multiqc/peak_count_header.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#id: 'peak_count'
#section_name: 'MACS2: Peak count'
#section_name: 'MERGED LIB: MACS2 peak count'
#description: "is calculated from total number of peaks called by
# <a href='https://github.com/taoliu/MACS' target='_blank'>MACS2</a>"
#plot_type: 'bargraph'
#anchor: 'nfcore_chipseq-peak_count'
#anchor: 'peak_count'
#pconfig:
# title: 'Total peak count'
# ylab: 'Peak count'
4 changes: 2 additions & 2 deletions assets/multiqc/spp_correlation_header.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#id: 'strand_shift_correlation'
#section_name: 'spp: Strand-shift correlation plot'
#section_name: 'MERGED LIB: spp strand-shift correlation'
#description: "generated using run_spp.R script from
# <a href='https://github.com/kundajelab/phantompeakqualtools' target='_blank'>phantompeakqualtools</a>."
#plot_type: 'linegraph'
#anchor: 'nfcore_chipseq-strand_shift_correlation'
#anchor: 'strand_shift_correlation'
#pconfig:
# title: 'Strand-shift correlation plot'
# ylab: 'Cross-correlation'
Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc/spp_nsc_header.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#id: 'nsc_coefficient'
#section_name: 'spp: NSC coefficient'
#section_name: 'MERGED LIB: spp NSC coefficient'
#description: "generated using run_spp.R script from
# <a href='https://github.com/kundajelab/phantompeakqualtools' target='_blank'>phantompeakqualtools</a>."
#plot_type: 'bargraph'
#anchor: 'nfcore_chipseq-nsc_coefficient'
#anchor: 'nsc_coefficient'
#pconfig:
# title: 'Normalized strand cross-correlation coefficient'
# ylab: 'NSC coefficient'
Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc/spp_rsc_header.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#id: 'rsc_coefficient'
#section_name: 'spp: RSC coefficient'
#section_name: 'MERGED LIB: spp RSC coefficient'
#description: "generated using run_spp.R script from
# <a href='https://github.com/kundajelab/phantompeakqualtools' target='_blank'>phantompeakqualtools</a>."
#plot_type: 'bargraph'
#anchor: 'nfcore_chipseq-rsc_coefficient'
#anchor: 'rsc_coefficient'
#pconfig:
# title: 'Relative strand cross-correlation coefficient'
# ylab: 'RSC coefficient'
Expand Down
83 changes: 46 additions & 37 deletions assets/multiqc_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,73 +21,67 @@ exclude_modules:

module_order:
- fastqc:
name: 'FastQC (library; raw)'
info: 'This section of the report shows FastQC results before adapter trimming.'
name: 'LIB: FastQC (raw)'
info: 'This section of the report shows FastQC results before adapter trimming for individual libraries.'
path_filters:
- '*_fastqc.zip'
path_filters_exclude:
- '*val*_fastqc.zip'
- '*trimmed_fastqc.zip'
- './fastqc/*.zip'
- cutadapt:
name: 'cutadapt (library; trimmed)'
info: 'This section of the report shows the length of trimmed reads by cutadapt.'
path_filters:
- '*_trimming_report.txt'
name: 'LIB: cutadapt (trimmed)'
info: 'This section of the report shows the length of trimmed reads by cutadapt for individual libraries.'
- fastqc:
name: 'FastQC (library; trimmed)'
info: 'This section of the report shows FastQC results after adapter trimming.'
name: 'LIB: FastQC (trimmed)'
info: 'This section of the report shows FastQC results after adapter trimming for individual libraries.'
path_filters:
- '*val*_fastqc.zip'
- '*trimmed_fastqc.zip'
- './trimgalore/fastqc/*.zip'
- samtools:
name: 'SAMTools (library)'
name: 'LIB: SAMTools'
info: 'This section of the report shows SAMTools results for individual libraries.'
path_filters:
- '*.Lb.sorted.bam*'
- './alignment/library/*'
- samtools:
name: 'SAMTools (merged library; unfiltered)'
name: 'MERGED LIB: SAMTools (unfiltered)'
info: 'This section of the report shows SAMTools results after merging libraries and before filtering.'
path_filters:
- '*mLb.mkD.sorted.bam*'
- './alignment/mergedLibrary/*.mLb.mkD.sorted.bam*'
- preseq:
name: 'Preseq (merged library; unfiltered)'
name: 'MERGED LIB: Preseq (unfiltered)'
info: 'This section of the report shows Preseq results after merging libraries and before filtering.'
path_filters:
- '*ccurve.txt'
- samtools:
name: 'SAMTools (merged library; filtered)'
name: 'MERGED LIB: SAMTools (filtered)'
info: 'This section of the report shows SAMTools results after merging libraries and after filtering.'
path_filters:
- '*mLb.clN.sorted.bam*'
- './alignment/mergedLibrary/*.mLb.clN.sorted.bam*'
- picard:
name: 'Picard (merged library; filtered)'
name: 'MERGED LIB: Picard'
info: 'This section of the report shows picard results after merging libraries and after filtering.'
path_filters:
- '*mLb*'
- './alignment/mergedLibrary/picard_metrics/*'
- deeptools:
name: 'deepTools'
name: 'MERGED LIB: deepTools'
anchor: 'mlib_deeptools'
info: 'This section of the report shows ChIP-seq QC plots generated by deepTools.'
path_filters:
- '*.plot*'
- featureCounts:
name: 'featureCounts'
name: 'MERGED LIB: featureCounts'
anchor: 'mlib_featurecounts'
info: 'This section of the report shows featureCounts results for the number of reads assigned to merged library consensus peaks.'
path_filters:
- '*featureCounts*'
- './macs/consensus/*.summary'

report_section_order:
peak_count:
order: -1000
before: mlib_deeptools
frip_score:
order: -1100
before: peak_count
peak_annotation:
before: frip_score
strand_shift_correlation:
order: -1200
before: peak_annotation
nsc_coefficient:
order: -1300
before: strand_shift_correlation
rsc_coefficient:
order: -1400
peak_annotation:
order: -1500
before: nsc_coefficient
mlib_featurecounts:
before: rsc_coefficient
deseq2_pca_1:
order: -1600
deseq2_pca_2:
Expand Down Expand Up @@ -153,3 +147,18 @@ extra_fn_clean_exts:
- '_spp'
- '.spp'
- 'ccurve'

# # Customise the module search patterns to speed up execution time
# # - Skip module sub-tools that we are not interested in
# # - Replace file-content searching with filename pattern searching
# # - Don't add anything that is the same as the MultiQC default
# # See https://multiqc.info/docs/#optimise-file-search-patterns for details
sp:
cutadapt:
fn: '*trimming_report.txt'
preseq:
fn: '*.ccurve.txt'
deeptools/plotFingerprintOutRawCounts:
fn: '*plotFingerprint*'
deeptools/plotProfile:
fn: '*plotProfile*'
3 changes: 2 additions & 1 deletion conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ params {

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/design_full.csv'
single_end = true

// Genome references
genome = 'hg38'
genome = 'hg19'
}
45 changes: 30 additions & 15 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,15 @@
* [`--skip_trimming`](#--skip_trimming)
* [`--save_trimmed`](#--save_trimmed)
* [Alignments](#alignments)
* [`--bwa_min_score`](#--bwa_min_score)
* [`--keep_dups`](#--keep_dups)
* [`--keep_multi_map`](#--keep_multi_map)
* [`--save_align_intermeds`](#--save_align_intermeds)
* [Peaks](#peaks)
* [`--narrow_peak`](#--narrow_peak)
* [`--broad_cutoff`](#--broad_cutoff)
* [`--macs_fdr`](#--macs_fdr)
* [`--macs_pvalue`](#--macs_pvalue)
* [`--min_reps_consensus`](#--min_reps_consensus)
* [`--save_macs_pileup`](#--save_macs_pileup)
* [`--skip_peak_qc`](#--skip_peak_qc)
Expand Down Expand Up @@ -335,11 +338,11 @@ If provided, alignments that overlap with the regions in this file will be filte

### `--save_reference`

If the BWA index is generated by the pipeline use this parameter to save it to your results folder. These can then be used for future pipeline runs, reducing processing times.
If the BWA index is generated by the pipeline use this parameter to save it to your results folder. These can then be used for future pipeline runs, reducing processing times (Default: false).

### `--igenomes_ignore`

Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`.
Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config` (Default: false).

## Adapter trimming

Expand All @@ -359,69 +362,81 @@ You can specify custom trimming parameters as follows:

### `--skip_trimming`

Skip the adapter trimming step. Use this if your input FastQ files have already been trimmed outside of the workflow or if you're very confident that there is no adapter contamination in your data.
Skip the adapter trimming step. Use this if your input FastQ files have already been trimmed outside of the workflow or if you're very confident that there is no adapter contamination in your data (Default: false).

### `--save_trimmed`

By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete.
By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete (Default: false).

## Alignments

### `--bwa_min_score`

Don’t output BWA MEM alignments with score lower than this parameter (Default: false).

### `--keep_dups`

Duplicate reads are not filtered from alignments.
Duplicate reads are not filtered from alignments (Default: false).

### `--keep_multi_map`

Reads mapping to multiple locations in the genome are not filtered from alignments.
Reads mapping to multiple locations in the genome are not filtered from alignments (Default: false).

### `--save_align_intermeds`

By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set to true to also save other intermediate BAM files.
By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set to true to also save other intermediate BAM files (Default: false).

## Peaks

### `--narrow_peak`

MACS2 is run by default with the [`--broad`](https://github.com/taoliu/MACS#--broad) flag. Specify this flag to call peaks in narrowPeak mode.
MACS2 is run by default with the [`--broad`](https://github.com/taoliu/MACS#--broad) flag. Specify this flag to call peaks in narrowPeak mode (Default: false).

### `--broad_cutoff`

Specifies broad cut-off value for MACS2. Only used when `--narrow_peak` isnt specified (Default: `0.1`).

### `--macs_fdr`

Minimum FDR (q-value) cutoff for peak detection, `--macs_fdr` and `--macs_pvalue` are mutually exclusive (Default: false).

### `--macs_pvalue`

p-value cutoff for peak detection, `--macs_fdr` and `--macs_pvalue` are mutually exclusive (Default: false). If `--macs_pvalue` cutoff is set, q-value will not be calculated and reported as -1 in the final .xls file.

### `--min_reps_consensus`

Number of biological replicates required from a given condition for a peak to contribute to a consensus peak . If you are confident you have good reproducibility amongst your replicates then you can increase the value of this parameter to create a "reproducible" set of consensus of peaks. For example, a value of 2 will mean peaks that have been called in at least 2 replicates will contribute to the consensus set of peaks, and as such peaks that are unique to a given replicate will be discarded.
Number of biological replicates required from a given condition for a peak to contribute to a consensus peak . If you are confident you have good reproducibility amongst your replicates then you can increase the value of this parameter to create a "reproducible" set of consensus of peaks. For example, a value of 2 will mean peaks that have been called in at least 2 replicates will contribute to the consensus set of peaks, and as such peaks that are unique to a given replicate will be discarded (Default: 1).

```bash
-- min_reps_consensus 1
```

### `--save_macs_pileup`

Instruct MACS2 to create bedGraph files using the `-B --SPMR` parameters.
Instruct MACS2 to create bedGraph files using the `-B --SPMR` parameters (Default: false).

### `--skip_peak_qc`

Skip MACS2 peak QC plot generation.
Skip MACS2 peak QC plot generation (Default: false).

### `--skip_peak_annotation`

Skip annotation of MACS2 and consensus peaks with HOMER.
Skip annotation of MACS2 and consensus peaks with HOMER (Default: false).

### `--skip_consensus_peaks`

Skip consensus peak generation, annotation and counting.
Skip consensus peak generation, annotation and counting (Default: false).

## Differential analysis

### `--deseq2_vst`

Use `vst` transformation instead of `rlog` with DESeq2. See [DESeq2 docs](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization).
Use `vst` transformation instead of `rlog` with DESeq2. See [DESeq2 docs](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization) (Default: false).

### `--skip_diff_analysis`

Skip differential binding analysis with DESeq2.
Skip differential binding analysis with DESeq2 (Default: false).

## Skipping QC steps

Expand Down
Loading

0 comments on commit 611a3ee

Please sign in to comment.