diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 47dc0f76..0d1890fa 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -1,47 +1,57 @@ # nf-core/atacseq: Contributing Guidelines -Hi there! Many thanks for taking an interest in improving nf-core/atacseq. +Hi there! +Many thanks for taking an interest in improving nf-core/atacseq. -We try to manage the required tasks for nf-core/atacseq using GitHub issues, you probably came to this page when creating one. Please use the pre-filled template to save time. - -However, don't be put off by this template - other more general issues and suggestions are welcome! Contributions to the code are even more welcome ;) - -> If you need help using or modifying nf-core/atacseq then the best place to ask is on the pipeline channel on [Slack](https://nf-co.re/join/slack/). +We try to manage the required tasks for nf-core/atacseq using GitHub issues, you probably came to this page when creating one. +Please use the pre-filled template to save time. +However, don't be put off by this template - other more general issues and suggestions are welcome! +Contributions to the code are even more welcome ;) +> If you need help using or modifying nf-core/atacseq then the best place to ask is on the nf-core Slack [#atacseq](https://nfcore.slack.com/channels/atacseq) channel ([join our Slack here](https://nf-co.re/join/slack)). ## Contribution workflow -If you'd like to write some code for nf-core/atacseq, the standard workflow -is as follows: -1. Check that there isn't already an issue about your idea in the - [nf-core/atacseq issues](https://github.com/nf-core/atacseq/issues) to avoid - duplicating work. +If you'd like to write some code for nf-core/atacseq, the standard workflow is as follows: + +1. Check that there isn't already an issue about your idea in the [nf-core/atacseq issues](https://github.com/nf-core/atacseq/issues) to avoid duplicating work * If there isn't one already, please create one so that others know you're working on this -2. Fork the [nf-core/atacseq repository](https://github.com/nf-core/atacseq) to your GitHub account +2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/atacseq repository](https://github.com/nf-core/atacseq) to your GitHub account 3. Make the necessary changes / additions within your forked repository -4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged. - -If you're not used to this workflow with git, you can start with some [basic docs from GitHub](https://help.github.com/articles/fork-a-repo/) or even their [excellent interactive tutorial](https://try.github.io/). +4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged +If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). ## Tests -When you create a pull request with changes, [Travis CI](https://travis-ci.com/) will run automatic tests. + +When you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests. Typically, pull-requests are only fully reviewed when these tests are passing, though of course we can help out before then. There are typically two types of tests that run: ### Lint Tests -The nf-core has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. + +`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. ### Pipeline Tests -Each nf-core pipeline should be set up with a minimal set of test-data. -Travis CI then runs the pipeline on this data to ensure that it exists successfully. + +Each `nf-core` pipeline should be set up with a minimal set of test-data. +`GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully. If there are any failures then the automated tests fail. -These tests are run both with the latest available version of Nextflow and also the minimum required version that is stated in the pipeline code. +These tests are run both with the latest available version of `Nextflow` and also the minimum required version that is stated in the pipeline code. + +## Patch + +: warning: Only in the unlikely and regretful event of a release happening with a bug. + +* On your own fork, make a new branch `patch` based on `upstream/master`. +* Fix the bug, and bump version (X.Y.Z+1). +* A PR should be made on `master` from patch to directly this particular bug. ## Getting help -For further information/help, please consult the [nf-core/atacseq documentation](https://github.com/nf-core/atacseq#documentation) and don't hesitate to get in touch on the [nf-core/atacseq pipeline channel](https://nfcore.slack.com/channels/atacseq) on [Slack](https://nf-co.re/join/slack/). + +For further information/help, please consult the [nf-core/atacseq documentation](https://nf-co.re/nf-core/atacseq/docs) and don't hesitate to get in touch on the nf-core Slack [#atacseq](https://nfcore.slack.com/channels/atacseq) channel ([join our Slack here](https://nf-co.re/join/slack)). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index b287bce7..010263cf 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -1,31 +1,42 @@ +# nf-core/atacseq bug report + Hi there! -Thanks for telling us about a problem with the pipeline. Please delete this text and anything that's not relevant from the template below: +Thanks for telling us about a problem with the pipeline. +Please delete this text and anything that's not relevant from the template below: + +## Describe the bug -#### Describe the bug A clear and concise description of what the bug is. -#### Steps to reproduce +## Steps to reproduce + Steps to reproduce the behaviour: + 1. Command line: `nextflow run ...` 2. See error: _Please provide your error message_ -#### Expected behaviour +## Expected behaviour + A clear and concise description of what you expected to happen. -#### System: - - Hardware: [e.g. HPC, Desktop, Cloud...] - - Executor: [e.g. slurm, local, awsbatch...] - - OS: [e.g. CentOS Linux, macOS, Linux Mint...] - - Version [e.g. 7, 10.13.6, 18.3...] +## System + +- Hardware: +- Executor: +- OS: +- Version + +## Nextflow Installation + +- Version: + +## Container engine -#### Nextflow Installation: - - Version: [e.g. 0.31.0] +- Engine: +- version: +- Image tag: -#### Container engine: - - Engine: [e.g. Conda, Docker or Singularity] - - version: [e.g. 1.0.0] - - Image tag: [e.g. nfcore/atacseq:1.0.0] +## Additional context -#### Additional context Add any other context about the problem here. diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index 1f025b77..222fe51a 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -1,16 +1,24 @@ +# nf-core/atacseq feature request + Hi there! -Thanks for suggesting a new feature for the pipeline! Please delete this text and anything that's not relevant from the template below: +Thanks for suggesting a new feature for the pipeline! +Please delete this text and anything that's not relevant from the template below: + +## Is your feature request related to a problem? Please describe -#### Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. + Ex. I'm always frustrated when [...] -#### Describe the solution you'd like +## Describe the solution you'd like + A clear and concise description of what you want to happen. -#### Describe alternatives you've considered +## Describe alternatives you've considered + A clear and concise description of any alternative solutions or features you've considered. -#### Additional context +## Additional context + Add any other context about the feature request here. diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 2784b969..67681d0c 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,15 +1,19 @@ -Many thanks to contributing to nf-core/atacseq! +# nf-core/atacseq pull request -Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs). +Many thanks for contributing to nf-core/atacseq! + +Please fill in the appropriate checklist below (delete whatever is not relevant). +These are the most common things requested on pull requests (PRs). ## PR checklist - - [ ] This comment contains a description of changes (with reason) - - [ ] If you've fixed a bug or added code that should be tested, add tests! - - [ ] If necessary, also make a PR on the [nf-core/atacseq branch on the nf-core/test-datasets repo]( https://github.com/nf-core/test-datasets/pull/new/nf-core/atacseq) - - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). - - [ ] Make sure your code lints (`nf-core lint .`). - - [ ] Documentation in `docs` is updated - - [ ] `CHANGELOG.md` is updated - - [ ] `README.md` is updated - -**Learn more about contributing:** https://github.com/nf-core/atacseq/tree/master/.github/CONTRIBUTING.md + +- [ ] This comment contains a description of changes (with reason) +- [ ] If you've fixed a bug or added code that should be tested, add tests! +- [ ] If necessary, also make a PR on the [nf-core/atacseq branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/atacseq) +- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). +- [ ] Make sure your code lints (`nf-core lint .`). +- [ ] Documentation in `docs` is updated +- [ ] `CHANGELOG.md` is updated +- [ ] `README.md` is updated + +**Learn more about contributing:** [CONTRIBUTING.md](https://github.com/nf-core/atacseq/tree/master/.github/CONTRIBUTING.md) \ No newline at end of file diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml index 96b12a70..5056c7fa 100644 --- a/.github/markdownlint.yml +++ b/.github/markdownlint.yml @@ -1,5 +1,9 @@ -# Markdownlint configuration file -default: true, -line-length: false -no-duplicate-header: - siblings_only: true +# Markdownlint configuration file +default: true, +line-length: false +no-duplicate-header: + siblings_only: true +MD033: + allowed_elements: [details, summary, p, img] +MD007: + indent: 4 diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml new file mode 100644 index 00000000..15f940dd --- /dev/null +++ b/.github/workflows/awsfulltest.yml @@ -0,0 +1,36 @@ +name: nf-core AWS full size tests +# This workflow is triggered on releases. +# It runs the -profile 'test_full' on AWS batch + +on: + release: + types: [published] + +jobs: + run-awstest: + if: github.repository == 'nf-core/atacseq' + name: Run AWS test + runs-on: ubuntu-latest + steps: + - name: Setup Miniconda + uses: goanpeca/setup-miniconda@v1.0.2 + with: + auto-update-conda: true + python-version: 3.7 + - name: Install awscli + run: conda install -c conda-forge awscli + - name: Start AWS batch job + env: + AWS_ACCESS_KEY_ID: ${{secrets.AWS_ACCESS_KEY_ID}} + AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_SECRET_ACCESS_KEY}} + TOWER_ACCESS_TOKEN: ${{secrets.AWS_TOWER_TOKEN}} + #AWS_JOB_DEFINITION: ${{secrets.AWS_JOB_DEFINITION}} + AWS_JOB_QUEUE: ${{secrets.AWS_JOB_QUEUE}} + AWS_S3_BUCKET: ${{secrets.AWS_S3_BUCKET}} + run: | # Submits job to AWS batch using a 'nextflow-4GiB' job definition. Setting JVM options to "-XX:+UseG1GC" for more efficient garbage collection when staging remote files. + aws batch submit-job \ + --region eu-west-1 \ + --job-name nf-core-atacseq \ + --job-queue $AWS_JOB_QUEUE \ + --job-definition nextflow-4GiB \ + --container-overrides '{"command": ["nf-core/atacseq", "-r '"${GITHUB_SHA}"' -profile test_full --outdir s3://'"${AWS_S3_BUCKET}"'/atacseq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/atacseq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}, {"name": "NXF_OPTS", "value": "-XX:+UseG1GC"}]}' diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml new file mode 100644 index 00000000..fae13576 --- /dev/null +++ b/.github/workflows/awstest.yml @@ -0,0 +1,38 @@ +name: nf-core AWS test +# This workflow is triggered on push to the master branch. +# It runs the -profile 'test' on AWS batch + +on: + push: + branches: + - master + - dev # just for testing purposes, to be removed + +jobs: + run-awstest: + if: github.repository == 'nf-core/atacseq' + name: Run AWS test + runs-on: ubuntu-latest + steps: + - name: Setup Miniconda + uses: goanpeca/setup-miniconda@v1.0.2 + with: + auto-update-conda: true + python-version: 3.7 + - name: Install awscli + run: conda install -c conda-forge awscli + - name: Start AWS batch job + env: + AWS_ACCESS_KEY_ID: ${{secrets.AWS_ACCESS_KEY_ID}} + AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_SECRET_ACCESS_KEY}} + TOWER_ACCESS_TOKEN: ${{secrets.AWS_TOWER_TOKEN}} + #AWS_JOB_DEFINITION: ${{secrets.AWS_JOB_DEFINITION}} + AWS_JOB_QUEUE: ${{secrets.AWS_JOB_QUEUE}} + AWS_S3_BUCKET: ${{secrets.AWS_S3_BUCKET}} + run: | # Submits job to AWS batch using a 'nextflow-4GiB' job definition. Setting JVM options to "-XX:+UseG1GC" for more efficient garbage collection when staging remote files. + aws batch submit-job \ + --region eu-west-1 \ + --job-name nf-core-atacseq \ + --job-queue $AWS_JOB_QUEUE \ + --job-definition nextflow-4GiB \ + --container-overrides '{"command": ["nf-core/atacseq", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/atacseq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/atacseq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}, {"name": "NXF_OPTS", "value": "-XX:+UseG1GC"}]}' diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml new file mode 100644 index 00000000..9ba92076 --- /dev/null +++ b/.github/workflows/branch.yml @@ -0,0 +1,16 @@ +name: nf-core branch protection +# This workflow is triggered on PRs to master branch on the repository +# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev` +on: + pull_request: + branches: + - master + +jobs: + test: + runs-on: ubuntu-18.04 + steps: + # PRs are only ok if coming from an nf-core `dev` branch or a fork `patch` branch + - name: Check PRs + run: | + { [[ $(git remote get-url origin) == *nf-core/atacseq ]] && [[ ${GITHUB_HEAD_REF} = "dev" ]]; } || [[ ${GITHUB_HEAD_REF} == "patch" ]] diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100755 index 00000000..8a37ba68 --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,125 @@ +name: nf-core CI +# This workflow is triggered on releases and pull-requests. +# It runs the pipeline with the minimal test dataset to check that it completes without any syntax errors +on: + push: + branches: + - dev + pull_request: + release: + types: [published] + +jobs: + test: + name: Run workflow tests + # Only run on push if this is the nf-core dev branch (merged PRs) + if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/atacseq') }} + runs-on: ubuntu-latest + env: + NXF_VER: ${{ matrix.nxf_ver }} + NXF_ANSI_LOG: false + strategy: + matrix: + # Nextflow versions: check pipeline minimum and current latest + nxf_ver: ['19.10.0', ''] + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Check if Dockerfile or Conda environment changed + uses: technote-space/get-diff-action@v1 + with: + PREFIX_FILTER: | + Dockerfile + environment.yml + + - name: Build new docker image + if: env.GIT_DIFF + run: docker build --no-cache . -t nfcore/atacseq:1.2.0 + + - name: Pull docker image + if: ${{ !env.GIT_DIFF }} + run: | + docker pull nfcore/atacseq:dev + docker tag nfcore/atacseq:dev nfcore/atacseq:1.2.0 + + - name: Install Nextflow + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ + + - name: Run pipeline with test data + run: | + nextflow run ${GITHUB_WORKSPACE} -profile test,docker + + parameters: + name: Test workflow parameters + if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/atacseq') }} + runs-on: ubuntu-latest + env: + NXF_VER: '19.10.0' + NXF_ANSI_LOG: false + strategy: + matrix: + parameters: [--single_end, --skip_trimming, --skip_merge_replicates, --skip_consensus_peaks] + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Check if Dockerfile or Conda environment changed + uses: technote-space/get-diff-action@v1 + with: + PREFIX_FILTER: | + Dockerfile + environment.yml + + - name: Build new docker image + if: env.GIT_DIFF + run: docker build --no-cache . -t nfcore/atacseq:1.2.0 + + - name: Pull docker image + if: ${{ !env.GIT_DIFF }} + run: | + docker pull nfcore/atacseq:dev + docker tag nfcore/atacseq:dev nfcore/atacseq:1.2.0 + + - name: Install Nextflow + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ + + - name: Run pipeline with various parameters + run: | + nextflow run ${GITHUB_WORKSPACE} -profile test,docker ${{ matrix.parameters }} + + push_dockerhub: + name: Push new Docker image to Docker Hub + runs-on: ubuntu-latest + # Only run if the tests passed + needs: test + # Only run for the nf-core repo, for releases and merged PRs + if: ${{ github.repository == 'nf-core/atacseq' && (github.event_name == 'release' || github.event_name == 'push') }} + env: + DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }} + DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }} + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Build new docker image + run: docker build --no-cache . -t nfcore/atacseq:latest + + - name: Push Docker image to DockerHub (dev) + if: ${{ github.event_name == 'push' }} + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker tag nfcore/atacseq:latest nfcore/atacseq:dev + docker push nfcore/atacseq:dev + + - name: Push Docker image to DockerHub (release) + if: ${{ github.event_name == 'release' }} + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker push nfcore/atacseq:latest + docker tag nfcore/atacseq:latest nfcore/atacseq:${{ github.event.release.tag_name }} + docker push nfcore/atacseq:${{ github.event.release.tag_name }} diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml new file mode 100755 index 00000000..eb66c144 --- /dev/null +++ b/.github/workflows/linting.yml @@ -0,0 +1,61 @@ +name: nf-core linting +# This workflow is triggered on pushes and PRs to the repository. +# It runs the `nf-core lint` and markdown lint tests to ensure that the code meets the nf-core guidelines +on: + push: + pull_request: + release: + types: [published] + +jobs: + Markdown: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + - uses: actions/setup-node@v1 + with: + node-version: '10' + - name: Install markdownlint + run: npm install -g markdownlint-cli + - name: Run Markdownlint + run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml + YAML: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v1 + - uses: actions/setup-node@v1 + with: + node-version: '10' + - name: Install yaml-lint + run: npm install -g yaml-lint + - name: Run yaml-lint + run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml") + nf-core: + runs-on: ubuntu-latest + steps: + + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Install Nextflow + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ + + - uses: actions/setup-python@v1 + with: + python-version: '3.6' + architecture: 'x64' + + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install nf-core + + - name: Run nf-core lint + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core lint ${GITHUB_WORKSPACE} + diff --git a/.gitignore b/.gitignore index 5b54e3e6..6354f370 100755 --- a/.gitignore +++ b/.gitignore @@ -3,5 +3,6 @@ work/ data/ results/ .DS_Store -tests/test_data +tests/ +testing/ *.pyc diff --git a/.travis.yml b/.travis.yml deleted file mode 100755 index 0d477d1b..00000000 --- a/.travis.yml +++ /dev/null @@ -1,42 +0,0 @@ -sudo: required -language: python -jdk: openjdk8 -services: docker -python: '3.6' -cache: pip -matrix: - fast_finish: true - -before_install: - # PRs to master are only ok if coming from dev branch - - '[ $TRAVIS_PULL_REQUEST = "false" ] || [ $TRAVIS_BRANCH != "master" ] || ([ $TRAVIS_PULL_REQUEST_SLUG = $TRAVIS_REPO_SLUG ] && ([ $TRAVIS_PULL_REQUEST_BRANCH = "dev" ] || [ $TRAVIS_PULL_REQUEST_BRANCH = "patch" ]))' - # Pull the docker image first so the test doesn't wait for this - - docker pull nfcore/atacseq:dev - # Fake the tag locally so that the pipeline runs properly - # Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1) - - docker tag nfcore/atacseq:dev nfcore/atacseq:1.1.0 - -install: - # Install Nextflow - - mkdir /tmp/nextflow && cd /tmp/nextflow - - wget -qO- get.nextflow.io | bash - - sudo ln -s /tmp/nextflow/nextflow /usr/local/bin/nextflow - # Install nf-core/tools - - pip install --upgrade pip - - pip install nf-core - # Reset - - mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests - # Install markdownlint-cli - - sudo apt-get install npm && npm install -g markdownlint-cli - -env: - - NXF_VER='19.10.0' # Specify a minimum NF version that should be tested and work - - NXF_VER='' # Plus: get the latest NF version and check that it works - -script: - # Lint the pipeline code - - nf-core lint ${TRAVIS_BUILD_DIR} - # Lint the documentation - - markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml - # Run the pipeline with the test profile - - nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker -ansi-log false diff --git a/CHANGELOG.md b/CHANGELOG.md index 9abafb44..bf724afa 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,10 +1,83 @@ # nf-core/atacseq: Changelog -All notable changes to this project will be documented in this file. - The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). +## [1.2.0] - 2020-07-02 + +### `Added` + +* [#63](https://github.com/nf-core/atacseq/issues/63) - Added multicore support for Trim Galore! +* [#75](https://github.com/nf-core/atacseq/issues/75) - Include gene annotation versions in multiqc report +* [#76](https://github.com/nf-core/atacseq/issues/76) - featureCounts coupled to DESeq2 +* [#79](https://github.com/nf-core/atacseq/issues/79) - Parallelize DESeq2 +* [#80](https://github.com/nf-core/atacseq/pull/80) - Added social preview image +* [#97](https://github.com/nf-core/atacseq/issues/97) - PBC1, PBC2 from pipeline? +* [#107](https://github.com/nf-core/atacseq/issues/107) - Add options to change MACS2 parameters +* [nf-core/chipseq#153](https://github.com/nf-core/chipseq/issues/153) - Add plotHeatmap +* [nf-core/chipseq#159](https://github.com/nf-core/chipseq/issues/159) - expose bwa mem -T parameter +* Regenerated screenshots and added collapsible sections for output files in `docs/output.md` +* Update template to tools `1.9` +* Replace `set` with `tuple` and `file()` with `path()` in all processes +* Capitalise process names +* Parameters: + * `--bwa_min_score` to set minimum alignment score for BWA MEM + * `--macs_fdr` to provide FDR threshold for MACS2 peak calling + * `--macs_pvalue` to provide p-value threshold for MACS2 peak calling + * `--skip_peak_qc` to skip MACS2 peak QC plot generation + * `--skip_peak_annotation` to skip annotation of MACS2 and consensus peaks with HOMER + * `--skip_consensus_peaks` to skip consensus peak generation + * `--deseq2_vst` to use variance stabilizing transformation (VST) instead of regularized log transformation (rlog) with DESeq2 + * `--publish_dir_mode` to customise method of publishing results to output directory [nf-core/tools#585](https://github.com/nf-core/tools/issues/585) + +### `Fixed` + +* [#71](https://github.com/nf-core/atacseq/issues/71) - consensus_peaks.mLb.clN.boolean.intersect.plot.pdf not generated +* [#73](https://github.com/nf-core/atacseq/issues/73) - macs_annotatePeaks.mLb.clN.summary.txt file is not created +* [#86](https://github.com/nf-core/atacseq/issues/86) - bug in the plot_homer_annotatepeaks.r script +* [#102](https://github.com/nf-core/atacseq/issues/102) - Incorrect Group ID assigned by featurecounts_deseq2.r +* [#110](https://github.com/nf-core/atacseq/pull/110) - updated AWS test GitHub actions +* [#109](https://github.com/nf-core/atacseq/issues/109) - Specify custom gtf but gene bed is not generated from that gtf? +* [nf-core/chipseq#118](https://github.com/nf-core/chipseq/issues/118) - Running on with SGE +* [nf-core/chipseq#132](https://github.com/nf-core/chipseq/issues/132) - BigWig Error: sort: cannot create temporary file in '': Read-only file system +* [nf-core/chipseq#154](https://github.com/nf-core/chipseq/issues/154) - computeMatrix.val.mat.gz files not zipped +* Make executables in `bin/` compatible with Python 3 + +### `Dependencies` + +* Add bioconductor-biocparallel `1.20.0` +* Add markdown `3.2.2` +* Add pigz `2.3.4` +* Add pygments `2.6.1` +* Add pymdown-extensions `7.1` +* Add python `3.7.6` +* Add r-reshape2 `1.4.4` +* Add r-tidyr `1.1.0` +* Update ataqv `1.0.0` -> `1.1.1` +* Update bedtools `2.27.1` -> `2.29.2` +* Update bioconductor-deseq2 `1.20.0` -> `1.26.0` +* Update bioconductor-vsn `3.46.0` -> `3.54.0` +* Update deeptools `3.2.1` -> `3.4.3` +* Update fastqc `0.11.8` -> `0.11.9` +* Update gawk `4.2.1` -> `5.1.0` +* Update homer `4.9.1` -> `4.11` +* Update macs2 `2.1.2` -> `2.2.7.1` +* Update multiqc `1.7` -> `1.8` +* Update picard `2.19.0` -> `2.23.1` +* Update pysam `0.15.2` -> `0.15.3` +* Update r-base `3.4.1` -> `3.6.2` +* Update r-ggplot2 `3.1.0` -> `3.3.2` +* Update r-lattice `0.20_35` -> `0.20_41` +* Update r-optparse `1.6.0` -> `1.6.6` +* Update r-pheatmap `1.0.10` -> `1.0.12` +* Update r-scales `1.0.0` -> `1.1.1` +* Update r-upsetr `1.3.3` -> `1.4.0` +* Update r-xfun `0.3` -> `0.15` +* Update samtools `1.9` -> `1.10` +* Update subread `1.6.4` -> `2.0.1` +* Update trim-galore `0.5.0` -> `0.6.5` +* Update ucsc-bedgraphtobigwig `377` -> `357` + ## [1.1.0] - 2019-11-05 ### `Added` @@ -16,14 +89,14 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0. * Add `CITATIONS.md` file * Capitalised process names * Add parameters: - * `--seq_center` - * `--trim_nextseq` - * `--fingerprint_bins` - * `--broad_cutoff` - * `--min_reps_consensus` - * `--save_macs_pileup` - * `--skip_diff_analysis` - * `--skip_*` for skipping QC steps + * `--seq_center` + * `--trim_nextseq` + * `--fingerprint_bins` + * `--broad_cutoff` + * `--min_reps_consensus` + * `--save_macs_pileup` + * `--skip_diff_analysis` + * `--skip_*` for skipping QC steps ### `Fixed` diff --git a/CITATIONS.md b/CITATIONS.md index 68e539f1..9af4aba2 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -1,9 +1,14 @@ # nf-core/atacseq: Citations -## Pipeline tools +## [nf-core](https://www.ncbi.nlm.nih.gov/pubmed/32055031/) + +> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ReadCube: [Full Access Link](https://rdcu.be/b1GjZ). + +## [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/) -* [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/) - > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. +> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. + +## Pipeline tools * [BWA](https://www.ncbi.nlm.nih.gov/pubmed/19451168/) > Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18. PubMed PMID: 19451168; PubMed Central PMCID: PMC2705234. @@ -97,4 +102,4 @@ * [Singularity](https://www.ncbi.nlm.nih.gov/pubmed/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. -* [Docker](https://www.docker.com/) +* [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 1cda7600..cf930c8a 100755 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -34,7 +34,7 @@ This Code of Conduct applies both within project spaces and in public spaces whe ## Enforcement -Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack/). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. +Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. diff --git a/Dockerfile b/Dockerfile old mode 100755 new mode 100644 index 7c8ffa13..715e6892 --- a/Dockerfile +++ b/Dockerfile @@ -1,13 +1,17 @@ -FROM nfcore/base:1.7 +FROM nfcore/base:1.9 LABEL authors="Harshil Patel" \ - description="Docker image containing all requirements for nf-core/atacseq pipeline" + description="Docker image containing all software requirements for the nf-core/atacseq pipeline" # Install the conda environment COPY environment.yml / RUN conda env create -f /environment.yml && conda clean -a # Add conda installation dir to PATH (instead of doing 'conda activate') -ENV PATH /opt/conda/envs/nf-core-atacseq-1.1.0/bin:$PATH +ENV PATH /opt/conda/envs/nf-core-atacseq-1.2.0/bin:$PATH # Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-atacseq-1.1.0 > nf-core-atacseq-1.1.0.yml +RUN conda env export --name nf-core-atacseq-1.2.0 > nf-core-atacseq-1.2.0.yml + +# Instruct R processes to use these empty files instead of clashing with a local version +RUN touch .Rprofile +RUN touch .Renviron diff --git a/README.md b/README.md index 32f5bb93..28121d11 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,13 @@ -# ![nfcore/atacseq](docs/images/nf-core-atacseq_logo.png) +# ![nf-core/atacseq](docs/images/nf-core-atacseq_logo.png) -[![Build Status](https://travis-ci.com/nf-core/atacseq.svg?branch=master)](https://travis-ci.com/nf-core/atacseq) +[![GitHub Actions CI Status](https://github.com/nf-core/atacseq/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/atacseq/actions) +[![GitHub Actions Linting Status](https://github.com/nf-core/atacseq/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/atacseq/actions) [![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A519.10.0-brightgreen.svg)](https://www.nextflow.io/) +[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2634132.svg)](https://doi.org/10.5281/zenodo.2634132) [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/) [![Docker](https://img.shields.io/docker/automated/nfcore/atacseq.svg)](https://hub.docker.com/r/nfcore/atacseq) -[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2634132.svg)](https://doi.org/10.5281/zenodo.2634132) +![[Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23atacseq-4A154B?logo=slack)](https://nfcore.slack.com/channels/atacseq) ## Introduction @@ -35,7 +37,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool * reads that arent in FR orientation ([`Pysam`](http://pysam.readthedocs.io/en/latest/installation.html); *paired-end only*) * reads where only one read of the pair fails the above criteria ([`Pysam`](http://pysam.readthedocs.io/en/latest/installation.html); *paired-end only*) 3. Alignment-level QC and estimation of library complexity ([`picard`](https://broadinstitute.github.io/picard/), [`Preseq`](http://smithlabresearch.org/software/preseq/)) - 4. Create normalised bigWig files scaled to 1 million mapped reads ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`wigToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/)) + 4. Create normalised bigWig files scaled to 1 million mapped reads ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/)) 5. Generate gene-body meta-profile from bigWig files ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotProfile.html)) 6. Calculate genome-wide enrichment ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html)) 7. Call broad/narrow peaks ([`MACS2`](https://github.com/taoliu/MACS)) @@ -47,7 +49,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool 6. Merge filtered alignments across replicates ([`picard`](https://broadinstitute.github.io/picard/)) 1. Re-mark duplicates ([`picard`](https://broadinstitute.github.io/picard/)) 2. Remove duplicate reads ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/)) - 3. Create normalised bigWig files scaled to 1 million mapped reads ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`wigToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/)) + 3. Create normalised bigWig files scaled to 1 million mapped reads ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/)) 4. Call broad/narrow peaks ([`MACS2`](https://github.com/taoliu/MACS)) 5. Annotate peaks relative to gene features ([`HOMER`](http://homer.ucsd.edu/homer/download.html)) 6. Create consensus peakset across all samples and create tabular file to aid in the filtering of the data ([`BEDTools`](https://github.com/arq5x/bedtools2/)) @@ -58,23 +60,23 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool ## Quick Start -i. Install [`nextflow`](https://nf-co.re/usage/installation) +1. Install [`nextflow`](https://nf-co.re/usage/installation) -ii. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html) +2. Install either [`Docker`](https://docs.docker.com/engine/installation/) or [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_ -iii. Download the pipeline and test it on a minimal dataset with a single command +3. Download the pipeline and test it on a minimal dataset with a single command: -```bash -nextflow run nf-core/atacseq -profile test, -``` + ```bash + nextflow run nf-core/atacseq -profile test, + ``` -> Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile institute` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + > Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. -iv. Start running your own analysis! +4. Start running your own analysis! -```bash -nextflow run nf-core/atacseq -profile --input design.csv --genome GRCh37 -``` + ```bash + nextflow run nf-core/atacseq -profile --input design.csv --genome GRCh37 + ``` See [usage docs](docs/usage.md) for all of the available options when running the pipeline. @@ -97,21 +99,25 @@ The pipeline was originally written by [The Bioinformatics & Biostatistics Group The pipeline was developed by [Harshil Patel](mailto:harshil.patel@crick.ac.uk). -The [nf-core/rnaseq](https://github.com/nf-core/rnaseq) and [nf-core/chipseq](https://github.com/nf-core/chipseq) pipelines developed by Phil Ewels were initially used as a template for this pipeline. Many thanks to Phil for all of his help and advice, and the team at SciLifeLab. - -Many thanks to others who have helped out along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@sven1103](https://github.com/sven1103), [@MaxUlysse](https://github.com/MaxUlysse), [@micans](https://github.com/micans), [@jinmingda](https://github.com/jinmingda), [@ktrns](https://github.com/ktrns), [@crickbabs](https://github.com/crickbabs), [@pditommaso](https://github.com/pditommaso). +Many thanks to others who have helped out and contributed along the way too, including (but not limited to): [@ewels](https://github.com/ewels), [@apeltzer](https://github.com/apeltzer), [@crickbabs](https://github.com/crickbabs), [drewjbeh](https://github.com/drewjbeh), [@houghtos](https://github.com/houghtos), [@jinmingda](https://github.com/jinmingda), [@ktrns](https://github.com/ktrns), [@MaxUlysse](https://github.com/MaxUlysse), [@mashehu](https://github.com/mashehu), [@micans](https://github.com/micans), [@pditommaso](https://github.com/pditommaso) and [@sven1103](https://github.com/sven1103). ## Contributions and Support If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). -For further information or help, don't hesitate to get in touch on [Slack](https://nfcore.slack.com/channels/atacseq) (you can join with [this invite](https://nf-co.re/join/slack)). +For further information or help, don't hesitate to get in touch on the [Slack `#atacseq` channel](https://nfcore.slack.com/channels/atacseq) (you can join with [this invite](https://nf-co.re/join/slack)). ## Citation If you use nf-core/atacseq for your analysis, please cite it using the following doi: [10.5281/zenodo.2634132](https://doi.org/10.5281/zenodo.2634132) -You can cite the `nf-core` pre-print as follows: -> Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**. *bioRxiv*. 2019. p. 610741. [doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v1). - An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. + +You can cite the `nf-core` publication as follows: + +> **The nf-core framework for community-curated bioinformatics pipelines.** +> +> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. +> +> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). +> ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) diff --git a/assets/email_template.html b/assets/email_template.html index 61a98cb6..098cd4bd 100755 --- a/assets/email_template.html +++ b/assets/email_template.html @@ -5,7 +5,7 @@ - + nf-core/atacseq Pipeline Report diff --git a/assets/email_template.txt b/assets/email_template.txt index 5f89db73..40533389 100755 --- a/assets/email_template.txt +++ b/assets/email_template.txt @@ -28,6 +28,7 @@ The command used to launch the workflow was as follows: $commandLine + Pipeline Configuration: ----------------------- <% out << summary.collect{ k,v -> " - $k: $v" }.join("\n") %> diff --git a/assets/multiqc/mlib_deseq2_clustering_header.txt b/assets/multiqc/mlib_deseq2_clustering_header.txt index 60237214..be3b8085 100644 --- a/assets/multiqc/mlib_deseq2_clustering_header.txt +++ b/assets/multiqc/mlib_deseq2_clustering_header.txt @@ -1,11 +1,11 @@ #id: 'mlib_deseq2_clustering' -#section_name: 'DESeq2: Sample similarity (merged library)' +#section_name: 'MERGED LIB: DESeq2 sample similarity' #description: "is generated from clustering by Euclidean distances between # DESeq2 # rlog values for each sample # in the featurecounts_deseq2.r script." #plot_type: 'heatmap' -#anchor: 'nfcore_atacseq-mlib_deseq2_clustering' +#anchor: 'mlib_deseq2_clustering' #pconfig: # title: 'DESeq2: Heatmap of the sample-to-sample distances' # xlab: True diff --git a/assets/multiqc/mlib_deseq2_pca_header.txt b/assets/multiqc/mlib_deseq2_pca_header.txt index 434c305d..ba6be8e2 100644 --- a/assets/multiqc/mlib_deseq2_pca_header.txt +++ b/assets/multiqc/mlib_deseq2_pca_header.txt @@ -1,10 +1,10 @@ #id: 'mlib_deseq2_pca' -#section_name: 'DESeq2: PCA plot (merged library)' +#section_name: 'MERGED LIB: DESeq2 PCA plot' #description: "PCA plot between samples in the experiment. # These values are calculated using DESeq2 # in the featurecounts_deseq2.r script." #plot_type: 'scatter' -#anchor: 'nfcore_atacseq-mlib_deseq2_pca' +#anchor: 'mlib_deseq2_pca' #pconfig: # title: 'DESeq2: Principal component plot' # xlab: PC1 diff --git a/assets/multiqc/mlib_frip_score_header.txt b/assets/multiqc/mlib_frip_score_header.txt index 36fb7e8e..d0edf803 100644 --- a/assets/multiqc/mlib_frip_score_header.txt +++ b/assets/multiqc/mlib_frip_score_header.txt @@ -1,10 +1,10 @@ #id: 'mlib_frip_score' -#section_name: 'MACS2: Peak FRiP score (merged library)' +#section_name: 'MERGED LIB: MACS2 peak FRiP score' #description: "is generated by calculating the fraction of all mapped reads that fall # into the MACS2 called peak regions. A read must overlap a peak by at least 20% to be counted. # See FRiP score." #plot_type: 'bargraph' -#anchor: 'nfcore_atacseq-mlib_frip_score' +#anchor: 'mlib_frip_score' #pconfig: # title: 'FRiP score' # ylab: 'FRiP score' diff --git a/assets/multiqc/mlib_peak_annotation_header.txt b/assets/multiqc/mlib_peak_annotation_header.txt index 3abac2f0..6f668c31 100644 --- a/assets/multiqc/mlib_peak_annotation_header.txt +++ b/assets/multiqc/mlib_peak_annotation_header.txt @@ -1,9 +1,9 @@ #id: 'mlib_peak_annotation' -#section_name: 'HOMER: Peak annotation (merged library)' +#section_name: 'MERGED LIB: HOMER peak annotation' #description: "is generated by calculating the proportion of peaks assigned to genomic features by # HOMER annotatePeaks.pl." #plot_type: 'bargraph' -#anchor: 'nfcore_atacseq-mlib_peak_annotation' +#anchor: 'mlib_peak_annotation' #pconfig: # title: 'Peak to feature proportion' # ylab: 'Peak count' diff --git a/assets/multiqc/mlib_peak_count_header.txt b/assets/multiqc/mlib_peak_count_header.txt index 31f03b72..9fd8c830 100644 --- a/assets/multiqc/mlib_peak_count_header.txt +++ b/assets/multiqc/mlib_peak_count_header.txt @@ -1,9 +1,9 @@ #id: 'mlib_peak_count' -#section_name: 'MACS2: Peak count (merged library)' +#section_name: 'MERGED LIB: MACS2 peak count' #description: "is calculated from total number of peaks called by # MACS2" #plot_type: 'bargraph' -#anchor: 'nfcore_atacseq-mlib_peak_count' +#anchor: 'mlib_peak_count' #pconfig: # title: 'Total peak count' # ylab: 'Peak count' diff --git a/assets/multiqc/mrep_deseq2_clustering_header.txt b/assets/multiqc/mrep_deseq2_clustering_header.txt index b1fdafe9..d36e8eab 100644 --- a/assets/multiqc/mrep_deseq2_clustering_header.txt +++ b/assets/multiqc/mrep_deseq2_clustering_header.txt @@ -1,11 +1,11 @@ #id: 'mrep_deseq2_clustering' -#section_name: 'DESeq2: Sample similarity (merged replicate)' +#section_name: 'MERGED REP: DESeq2 sample similarity' #description: "is generated from clustering by Euclidean distances between # DESeq2 # rlog values for each sample # in the featurecounts_deseq2.r script." #plot_type: 'heatmap' -#anchor: 'nfcore_atacseq-mrep_deseq2_clustering' +#anchor: 'mrep_deseq2_clustering' #pconfig: # title: 'DESeq2: Heatmap of the sample-to-sample distances' # xlab: True diff --git a/assets/multiqc/mrep_deseq2_pca_header.txt b/assets/multiqc/mrep_deseq2_pca_header.txt index 1ba35a93..2f58a377 100644 --- a/assets/multiqc/mrep_deseq2_pca_header.txt +++ b/assets/multiqc/mrep_deseq2_pca_header.txt @@ -1,10 +1,10 @@ #id: 'mrep_deseq2_pca' -#section_name: 'DESeq2: PCA plot (merged replicate)' +#section_name: 'MERGED REP: DESeq2 PCA plot' #description: "PCA plot between samples in the experiment. # These values are calculated using DESeq2 # in the featurecounts_deseq2.r script." #plot_type: 'scatter' -#anchor: 'nfcore_atacseq-mrep_deseq2_pca' +#anchor: 'mrep_deseq2_pca' #pconfig: # title: 'DESeq2: Principal component plot' # xlab: PC1 diff --git a/assets/multiqc/mrep_frip_score_header.txt b/assets/multiqc/mrep_frip_score_header.txt index 92265b06..8a2c5ad1 100644 --- a/assets/multiqc/mrep_frip_score_header.txt +++ b/assets/multiqc/mrep_frip_score_header.txt @@ -1,10 +1,10 @@ #id: 'mrep_frip_score' -#section_name: 'MACS2: Peak FRiP score (merged replicate)' +#section_name: 'MERGED REP: MACS2 peak FRiP score' #description: "is generated by calculating the fraction of all mapped reads that fall # into the MACS2 called peak regions. A read must overlap a peak by at least 20% to be counted. # See FRiP score." #plot_type: 'bargraph' -#anchor: 'nfcore_atacseq-mrep_frip_score' +#anchor: 'mrep_frip_score' #pconfig: # title: 'FRiP score' # ylab: 'FRiP score' diff --git a/assets/multiqc/mrep_peak_annotation_header.txt b/assets/multiqc/mrep_peak_annotation_header.txt index 4e605e68..d9d0eb03 100644 --- a/assets/multiqc/mrep_peak_annotation_header.txt +++ b/assets/multiqc/mrep_peak_annotation_header.txt @@ -1,9 +1,9 @@ #id: 'mrep_peak_annotation' -#section_name: 'HOMER: Peak annotation (merged replicate)' +#section_name: 'MERGED REP: HOMER peak annotation' #description: "is generated by calculating the proportion of peaks assigned to genomic features by # HOMER annotatePeaks.pl." #plot_type: 'bargraph' -#anchor: 'nfcore_atacseq-mrep_peak_annotation' +#anchor: 'mrep_peak_annotation' #pconfig: # title: 'Peak to feature proportion' # ylab: 'Peak count' diff --git a/assets/multiqc/mrep_peak_count_header.txt b/assets/multiqc/mrep_peak_count_header.txt index 2d69a0aa..66796ffe 100644 --- a/assets/multiqc/mrep_peak_count_header.txt +++ b/assets/multiqc/mrep_peak_count_header.txt @@ -1,9 +1,9 @@ #id: 'mrep_peak_count' -#section_name: 'MACS2: Peak count (merged replicate)' +#section_name: 'MERGED REP: MACS2 Peak count' #description: "is calculated from total number of peaks called by # MACS2" #plot_type: 'bargraph' -#anchor: 'nfcore_atacseq-mrep_peak_count' +#anchor: 'mrep_peak_count' #pconfig: # title: 'Total peak count' # ylab: 'Peak count' diff --git a/assets/multiqc/multiqc_config.yaml b/assets/multiqc_config.yaml similarity index 50% rename from assets/multiqc/multiqc_config.yaml rename to assets/multiqc_config.yaml index e76c08b7..9416fbf1 100755 --- a/assets/multiqc/multiqc_config.yaml +++ b/assets/multiqc_config.yaml @@ -3,124 +3,148 @@ report_comment: > analysis pipeline. For information about how to interpret these results, please see the documentation. -export_plots: true +data_format: 'yaml' -fn_clean_exts: - - 'fastq.gz' - - '_trimmed' - - '_val' - - 'sorted.bam' - - '.Lb' - - 'mkD' - - 'clN' - - 'mLb' - - 'mRp' - - '_peaks' +run_modules: + - custom_content + - fastqc + - cutadapt + - samtools + - picard + - preseq + - featureCounts + - deeptools + +exclude_modules: + - 'general_stats' module_order: - fastqc: - name: 'FastQC (library; raw)' - info: 'This section of the report shows FastQC results before adapter trimming.' + name: 'LIB: FastQC (raw)' + info: 'This section of the report shows FastQC results before adapter trimming for individual libraries.' path_filters: - - '*_fastqc.zip' - path_filters_exclude: - - '*val*_fastqc.zip' - - '*trimmed_fastqc.zip' + - './fastqc/*.zip' - cutadapt: - name: 'cutadapt (library; trimmed)' - info: 'This section of the report shows the length of trimmed reads by cutadapt.' - path_filters: - - '*_trimming_report.txt' + name: 'LIB: Cutadapt (trimmed)' + info: 'This section of the report shows the length of trimmed reads by Cutadapt for individual libraries.' - fastqc: - name: 'FastQC (library; trimmed)' - info: 'This section of the report shows FastQC results after adapter trimming.' + name: 'LIB: FastQC (trimmed)' + info: 'This section of the report shows FastQC results after adapter trimming for individual libraries.' path_filters: - - '*val*_fastqc.zip' - - '*trimmed_fastqc.zip' + - './trimgalore/fastqc/*.zip' - samtools: - name: 'SAMTools (library)' + name: 'LIB: SAMTools' info: 'This section of the report shows SAMTools results for individual libraries.' path_filters: - - '*.Lb.sorted.bam*' + - './alignment/library/*' - samtools: - name: 'SAMTools (merged library; unfiltered)' + name: 'MERGED LIB: SAMTools (unfiltered)' info: 'This section of the report shows SAMTools results after merging libraries and before filtering.' path_filters: - - '*mLb.mkD.sorted.bam*' + - './alignment/mergedLibrary/*.mLb.mkD.sorted.bam*' - preseq: - name: 'Preseq (merged library; unfiltered)' + name: 'MERGED LIB: Preseq (unfiltered)' info: 'This section of the report shows Preseq results after merging libraries and before filtering.' - path_filters: - - '*mLb*' - samtools: - name: 'SAMTools (merged library; filtered)' + name: 'MERGED LIB: SAMTools (filtered)' info: 'This section of the report shows SAMTools results after merging libraries and after filtering.' path_filters: - - '*mLb.clN.sorted.bam*' + - './alignment/mergedLibrary/*.mLb.clN.sorted.bam*' - picard: - name: 'Picard (merged library; filtered)' + name: 'MERGED LIB: Picard' info: 'This section of the report shows picard results after merging libraries and after filtering.' path_filters: - - '*mLb*' + - './alignment/mergedLibrary/picard_metrics/*' - deeptools: - name: 'deepTools' + name: 'MERGED LIB: deepTools' + anchor: 'mlib_deeptools' info: 'This section of the report shows QC plots generated by deepTools.' + - featureCounts: + name: 'MERGED LIB: featureCounts' + anchor: 'mlib_featurecounts' + info: 'This section of the report shows featureCounts results for the number of reads assigned to merged library consensus peaks.' path_filters: - - '*.plot*' + - './macs/mergedLibrary/consensus/*.summary' - samtools: - name: 'SAMTools (merged replicate; filtered)' - info: 'This section of the report shows SAMTools results after merging replicates and before filtering.' + name: 'MERGED REP: SAMTools' + info: 'This section of the report shows SAMTools results after merging replicates and filtering.' path_filters: - - '*mRp.clN*' + - './alignment/mergedReplicate/*' - picard: - name: 'Picard (merged replicate; filtered)' + name: 'MERGED REP: Picard' + anchor: 'mrep_picard' info: 'This section of the report shows picard results after merging libraries and before filtering.' path_filters: - - '*mRp.clN*' + - './alignment/mergedReplicate/*' - featureCounts: - name: 'featureCounts (merged library)' - info: 'This section of the report shows featureCounts results for the number of reads assigned to merged library consensus peaks.' - path_filters: - - '*mLb.clN.featureCounts*' - - featureCounts: - name: 'featureCounts (merged replicate)' + name: 'MERGED REP: featureCounts' + anchor: 'mrep_featurecounts' info: 'This section of the report shows featureCounts results for the number of reads assigned to merged replicate consensus peaks.' path_filters: - - '*mRp.clN.featureCounts*' + - './macs/mergedReplicate/consensus/*.summary' report_section_order: mlib_peak_count: - order: -1000 - mrep_peak_count: - order: -1100 + before: mlib_deeptools mlib_frip_score: - order: -1200 - mrep_frip_score: - order: -1300 + before: mlib_peak_count mlib_peak_annotation: - order: -1400 - mrep_peak_annotation: - order: -1500 + before: mlib_frip_score + mlib_featurecounts: + before: mlib_peak_annotation mlib_deseq2_pca: - order: -1600 - mrep_deseq2_pca: - order: -1700 + before: mlib_featurecounts mlib_deseq2_clustering: - order: -1800 + before: mlib_deseq2_pca + mrep_peak_count: + before: mrep_picard + mrep_frip_score: + before: mrep_peak_count + mrep_peak_annotation: + before: mrep_frip_score + mrep_featurecounts: + before: mrep_peak_annotation + mrep_deseq2_pca: + before: mrep_featurecounts mrep_deseq2_clustering: - order: -1900 + before: mrep_deseq2_pca software_versions: - order: -2000 + order: -1001 nf-core-atacseq-summary: - order: -2100 + order: -1002 custom_plot_config: - picard-insertsize: + picard_insert_size: cpswitch_c_active: False smooth_points: 1000 - featurecounts: cpswitch_c_active: False - featurecounts-1: cpswitch_c_active: False + +extra_fn_clean_exts: + - 'fastq.gz' + - '_trimmed' + - '_val' + - 'sorted.bam' + - '.Lb' + - 'mkD' + - 'clN' + - 'mLb' + - 'mRp' + - '_peaks' + +# # Customise the module search patterns to speed up execution time +# # - Skip module sub-tools that we are not interested in +# # - Replace file-content searching with filename pattern searching +# # - Don't add anything that is the same as the MultiQC default +# # See https://multiqc.info/docs/#optimise-file-search-patterns for details +sp: + cutadapt: + fn: '*trimming_report.txt' + preseq: + fn: '*.ccurve.txt' + deeptools/plotFingerprintOutRawCounts: + fn: '*plotFingerprint*' + deeptools/plotProfile: + fn: '*plotProfile*' diff --git a/assets/nf-core-atacseq_logo.png b/assets/nf-core-atacseq_logo.png old mode 100755 new mode 100644 diff --git a/assets/nf-core-atacseq_social_preview.png b/assets/nf-core-atacseq_social_preview.png new file mode 100644 index 00000000..38755a05 Binary files /dev/null and b/assets/nf-core-atacseq_social_preview.png differ diff --git a/assets/nf-core-atacseq_social_preview.svg b/assets/nf-core-atacseq_social_preview.svg new file mode 100644 index 00000000..74a26f2d --- /dev/null +++ b/assets/nf-core-atacseq_social_preview.svg @@ -0,0 +1,448 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + ATAC-seq peak-calling, QC and differential analysis pipeline + atacseq + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/bin/bampe_rm_orphan.py b/bin/bampe_rm_orphan.py index c11d4e67..5b0a6f72 100755 --- a/bin/bampe_rm_orphan.py +++ b/bin/bampe_rm_orphan.py @@ -8,6 +8,7 @@ import os import pysam +import errno import argparse ############################################ @@ -61,10 +62,11 @@ def bampe_rm_orphan(BAMIn,BAMOut,onlyFRPairs=False): ## ITERATE THROUGH BAM FILE EOF = 0 - SAMFin = pysam.Samfile(BAMIn,"rb") - SAMFout = pysam.Samfile(BAMOut, "wb",header=SAMFin.header) - currRead = SAMFin.next() - for read in SAMFin: + SAMFin = pysam.AlignmentFile(BAMIn, "rb") + SAMFout = pysam.AlignmentFile(BAMOut, "wb", header=SAMFin.header) + iter = SAMFin.fetch(until_eof=True) + currRead = next(iter) + for read in iter: totalReads += 1 if currRead.qname == read.qname: pair1 = currRead; pair2 = read @@ -103,7 +105,7 @@ def bampe_rm_orphan(BAMIn,BAMOut,onlyFRPairs=False): ## RESET COUNTER try: totalReads += 1 - currRead = SAMFin.next() + currRead = next(iter) except: StopIteration EOF = 1 diff --git a/bin/check_design.py b/bin/check_design.py index 4c1436f0..d55e2cec 100755 --- a/bin/check_design.py +++ b/bin/check_design.py @@ -8,7 +8,6 @@ import os import sys -import requests import argparse ############################################ @@ -42,7 +41,7 @@ def check_design(DesignFileIn,DesignFileOut): fin = open(DesignFileIn,'r') header = fin.readline().strip().split(',') if header != HEADER: - print "{} header: {} != {}".format(ERROR_STR,','.join(header),','.join(HEADER)) + print("{} header: {} != {}".format(ERROR_STR,','.join(header),','.join(HEADER))) sys.exit(1) numColList = [] @@ -55,32 +54,32 @@ def check_design(DesignFileIn,DesignFileOut): ## CHECK VALID NUMBER OF COLUMNS PER SAMPLE numCols = len(lspl) if numCols not in [3,4]: - print "{}: Invalid number of columns (3 for single-end or 4 for paired-end)!\nLine: '{}'".format(ERROR_STR,line.strip()) + print("{}: Invalid number of columns (3 for single-end or 4 for paired-end)!\nLine: '{}'".format(ERROR_STR,line.strip())) sys.exit(1) numColList.append(numCols) ## CHECK GROUP COLUMN HAS NO SPACES group,replicate,fastQFiles = lspl[0],lspl[1],lspl[2:] if group.find(' ') != -1: - print "{}: Group id contains spaces!\nLine: '{}'".format(ERROR_STR,line.strip()) + print("{}: Group id contains spaces!\nLine: '{}'".format(ERROR_STR,line.strip())) sys.exit(1) ## CHECK REPLICATE COLUMN IS INTEGER if not replicate.isdigit(): - print "{}: Replicate id not an integer!\nLine: '{}'".format(ERROR_STR,line.strip()) + print("{}: Replicate id not an integer!\nLine: '{}'".format(ERROR_STR,line.strip())) sys.exit(1) for fastq in fastQFiles: ## CHECK FASTQ FILE EXTENSION if fastq[-9:] != '.fastq.gz' and fastq[-6:] != '.fq.gz': - print "{}: FastQ file has incorrect extension (has to be '.fastq.gz' or 'fq.gz') - {}\nLine: '{}'".format(ERROR_STR,fastq,line.strip()) + print("{}: FastQ file has incorrect extension (has to be '.fastq.gz' or 'fq.gz') - {}\nLine: '{}'".format(ERROR_STR,fastq,line.strip())) sys.exit(1) ## CREATE GROUP MAPPING DICT = {GROUP_ID: {REPLICATE_ID:[[FASTQ_FILES]]} replicate = int(replicate) - if not groupRepDict.has_key(group): + if group not in groupRepDict: groupRepDict[group] = {} - if not groupRepDict[group].has_key(replicate): + if replicate not in groupRepDict[group]: groupRepDict[group][replicate] = [] groupRepDict[group][replicate].append(fastQFiles) @@ -90,7 +89,7 @@ def check_design(DesignFileIn,DesignFileOut): ## CHECK IF DATA IS PAIRED-END OR SINGLE-END AND NOT A MIXTURE if min(numColList) != max(numColList): - print "{}: Mixture of paired-end and single-end reads!".format(ERROR_STR) + print("{}: Mixture of paired-end and single-end reads!".format(ERROR_STR)) sys.exit(1) ## CHECK IF MULTIPLE GROUPS EXIST @@ -107,7 +106,7 @@ def check_design(DesignFileIn,DesignFileOut): ## CHECK THAT REPLICATE IDS ARE IN FORMAT 1.. uniq_rep_ids = set(groupRepDict[group].keys()) if len(uniq_rep_ids) != max(uniq_rep_ids): - print "{}: Replicate IDs must start with 1..\nGroup: {}, Replicate IDs: {}".format(ERROR_STR,group,list(uniq_rep_ids)) + print("{}: Replicate IDs must start with 1..\nGroup: {}, Replicate IDs: {}".format(ERROR_STR,group,list(uniq_rep_ids))) sys.exit(1) numRepList.append(max(uniq_rep_ids)) diff --git a/bin/featurecounts_deseq2.r b/bin/featurecounts_deseq2.r index c6683cbb..b75335a5 100755 --- a/bin/featurecounts_deseq2.r +++ b/bin/featurecounts_deseq2.r @@ -25,6 +25,7 @@ library(ggplot2) library(RColorBrewer) library(pheatmap) library(lattice) +library(BiocParallel) ################################################ ################################################ @@ -32,11 +33,13 @@ library(lattice) ################################################ ################################################ -option_list <- list(make_option(c("-i", "--featurecount_file"), type="character", default=NULL, help="Feature count file generated by the SubRead featureCounts command.", metavar="featurecount_file"), - make_option(c("-b", "--bam_suffix"), type="character", default=NULL, help="Portion of filename after sample name in featurecount file header e.g. '.rmDup.bam' if 'DRUG_R1.rmDup.bam'", metavar="bam_suffix"), +option_list <- list(make_option(c("-i", "--featurecount_file"), type="character", default=NULL, help="Feature count file generated by the SubRead featureCounts command.", metavar="path"), + make_option(c("-b", "--bam_suffix"), type="character", default=NULL, help="Portion of filename after sample name in featurecount file header e.g. '.rmDup.bam' if 'DRUG_R1.rmDup.bam'", metavar="string"), make_option(c("-o", "--outdir"), type="character", default='./', help="Output directory", metavar="path"), - make_option(c("-p", "--outprefix"), type="character", default='differential', help="Output prefix", metavar="character"), - make_option(c("-s", "--outsuffix"), type="character", default='', help="Output suffix for comparison-level results", metavar="character")) + make_option(c("-p", "--outprefix"), type="character", default='differential', help="Output prefix", metavar="string"), + make_option(c("-s", "--outsuffix"), type="character", default='', help="Output suffix for comparison-level results", metavar="string"), + make_option(c("-v", "--vst"), type="logical", default=FALSE, help="Run vst transform instead of rlog", metavar="boolean"), + make_option(c("-c", "--cores"), type="integer", default=1, help="Number of cores", metavar="integer")) opt_parser <- OptionParser(option_list=option_list) opt <- parse_args(opt_parser) @@ -75,7 +78,7 @@ if (file.exists(opt$outdir) == FALSE) { setwd(opt$outdir) samples.vec <- sort(colnames(count.table)) -groups <- substr(samples.vec, 1, nchar(samples.vec)-3) +groups <- sub("_[^_]+$", "", samples.vec) print(unique(groups)) if (length(unique(groups)) == 1) { quit(save = "no", status = 0, runLast = FALSE) @@ -86,8 +89,12 @@ if (file.exists(DDSFile) == FALSE) { counts <- count.table[,samples.vec,drop=FALSE] coldata <- data.frame(row.names=colnames(counts),condition=groups) dds <- DESeqDataSetFromMatrix(countData = round(counts), colData = coldata, design = ~ condition) - dds <- DESeq(dds) - rld <- rlog(dds) + dds <- DESeq(dds, parallel=TRUE, BPPARAM=MulticoreParam(opt$cores)) + if (!opt$vst) { + rld <- rlog(dds) + } else { + rld <- vst(dds) + } save(dds,rld,file=DDSFile) } diff --git a/bin/get_autosomes.py b/bin/get_autosomes.py index a8603b6b..2d530561 100755 --- a/bin/get_autosomes.py +++ b/bin/get_autosomes.py @@ -7,6 +7,7 @@ ####################################################################### import os +import errno import argparse ############################################ diff --git a/bin/igv_files_to_session.py b/bin/igv_files_to_session.py index 6dc6aabe..48e749c8 100755 --- a/bin/igv_files_to_session.py +++ b/bin/igv_files_to_session.py @@ -7,6 +7,7 @@ ####################################################################### import os +import errno import argparse ############################################ diff --git a/bin/macs2_merged_expand.py b/bin/macs2_merged_expand.py index 10889947..f4e84a14 100755 --- a/bin/macs2_merged_expand.py +++ b/bin/macs2_merged_expand.py @@ -7,6 +7,7 @@ ####################################################################### import os +import errno import argparse ############################################ @@ -87,9 +88,9 @@ def macs2_merged_expand(MergedIntervalTxtFile,SampleNameList,OutFile,isNarrow=Fa groupDict = {} for sID in ['_'.join(x.split('_')[:-2]) for x in names]: gID = '_'.join(sID.split('_')[:-1]) - if not groupDict.has_key(gID): + if gID not in groupDict: groupDict[gID] = [] - if not sID in groupDict[gID]: + if sID not in groupDict[gID]: groupDict[gID].append(sID) ## GET SAMPLES THAT PASS REPLICATE THRESHOLD @@ -103,23 +104,23 @@ def macs2_merged_expand(MergedIntervalTxtFile,SampleNameList,OutFile,isNarrow=Fa for idx in range(len(names)): sample = '_'.join(names[idx].split('_')[:-2]) if sample in passRepThreshList: - if not fcDict.has_key(sample): + if sample not in fcDict: fcDict[sample] = [] fcDict[sample].append(str(fcs[idx])) - if not qvalDict.has_key(sample): + if sample not in qvalDict: qvalDict[sample] = [] qvalDict[sample].append(str(qvals[idx])) - if not pvalDict.has_key(sample): + if sample not in pvalDict: pvalDict[sample] = [] pvalDict[sample].append(str(pvals[idx])) - if not startDict.has_key(sample): + if sample not in startDict: startDict[sample] = [] startDict[sample].append(str(starts[idx])) - if not endDict.has_key(sample): + if sample not in endDict: endDict[sample] = [] endDict[sample].append(str(ends[idx])) if isNarrow: - if not summitDict.has_key(sample): + if sample not in summitDict: summitDict[sample] = [] summitDict[sample].append(str(summits[idx])) @@ -138,7 +139,7 @@ def macs2_merged_expand(MergedIntervalTxtFile,SampleNameList,OutFile,isNarrow=Fa fout.write('\t'.join(oList) + '\n') tsamples = tuple(sorted(samples)) - if not combFreqDict.has_key(tsamples): + if tsamples not in combFreqDict: combFreqDict[tsamples] = 0 combFreqDict[tsamples] += 1 totalOutIntervals += 1 diff --git a/bin/markdown_to_html.py b/bin/markdown_to_html.py new file mode 100755 index 00000000..57cc4263 --- /dev/null +++ b/bin/markdown_to_html.py @@ -0,0 +1,100 @@ +#!/usr/bin/env python +from __future__ import print_function +import argparse +import markdown +import os +import sys + +def convert_markdown(in_fn): + input_md = open(in_fn, mode="r", encoding="utf-8").read() + html = markdown.markdown( + "[TOC]\n" + input_md, + extensions = [ + 'pymdownx.extra', + 'pymdownx.b64', + 'pymdownx.highlight', + 'pymdownx.emoji', + 'pymdownx.tilde', + 'toc' + ], + extension_configs = { + 'pymdownx.b64': { + 'base_path': os.path.dirname(in_fn) + }, + 'pymdownx.highlight': { + 'noclasses': True + }, + 'toc': { + 'title': 'Table of Contents' + } + } + ) + return html + +def wrap_html(contents): + header = """ + + + + + +
+ """ + footer = """ +
+ + + """ + return header + contents + footer + + +def parse_args(args=None): + parser = argparse.ArgumentParser() + parser.add_argument('mdfile', type=argparse.FileType('r'), nargs='?', + help='File to convert. Defaults to stdin.') + parser.add_argument('-o', '--out', type=argparse.FileType('w'), + default=sys.stdout, + help='Output file name. Defaults to stdout.') + return parser.parse_args(args) + +def main(args=None): + args = parse_args(args) + converted_md = convert_markdown(args.mdfile.name) + html = wrap_html(converted_md) + args.out.write(html) + +if __name__ == '__main__': + sys.exit(main()) diff --git a/bin/markdown_to_html.r b/bin/markdown_to_html.r deleted file mode 100755 index abe13350..00000000 --- a/bin/markdown_to_html.r +++ /dev/null @@ -1,51 +0,0 @@ -#!/usr/bin/env Rscript - -# Command line argument processing -args = commandArgs(trailingOnly=TRUE) -if (length(args) < 2) { - stop("Usage: markdown_to_html.r ", call.=FALSE) -} -markdown_fn <- args[1] -output_fn <- args[2] - -# Load / install packages -if (!require("markdown")) { - install.packages("markdown", dependencies=TRUE, repos='http://cloud.r-project.org/') - library("markdown") -} - -base_css_fn <- getOption("markdown.HTML.stylesheet") -base_css <- readChar(base_css_fn, file.info(base_css_fn)$size) -custom_css <- paste(base_css, " -body { - padding: 3em; - margin-right: 350px; - max-width: 100%; -} -#toc { - position: fixed; - right: 20px; - width: 300px; - padding-top: 20px; - overflow: scroll; - height: calc(100% - 3em - 20px); -} -#toc_header { - font-size: 1.8em; - font-weight: bold; -} -#toc > ul { - padding-left: 0; - list-style-type: none; -} -#toc > ul ul { padding-left: 20px; } -#toc > ul > li > a { display: none; } -img { max-width: 800px; } -") - -markdownToHTML( - file = markdown_fn, - output = output_fn, - stylesheet = custom_css, - options = c('toc', 'base64_images', 'highlight_code') -) diff --git a/bin/plot_homer_annotatepeaks.r b/bin/plot_homer_annotatepeaks.r index 9865357c..4a867d8f 100755 --- a/bin/plot_homer_annotatepeaks.r +++ b/bin/plot_homer_annotatepeaks.r @@ -17,10 +17,10 @@ library(scales) ################################################ ################################################ -option_list <- list(make_option(c("-i", "--homer_files"), type="character", default=NULL, help="Comma-separated list of homer annotated text files.", metavar="anno_files"), - make_option(c("-s", "--sample_ids"), type="character", default=NULL, help="Comma-separated list of sample ids associated with homer annotated text files. Must be unique and in same order as homer files input.", metavar="sampleids"), +option_list <- list(make_option(c("-i", "--homer_files"), type="character", default=NULL, help="Comma-separated list of homer annotated text files.", metavar="path"), + make_option(c("-s", "--sample_ids"), type="character", default=NULL, help="Comma-separated list of sample ids associated with homer annotated text files. Must be unique and in same order as homer files input.", metavar="string"), make_option(c("-o", "--outdir"), type="character", default='./', help="Output directory", metavar="path"), - make_option(c("-p", "--outprefix"), type="character", default='homer_annotation', help="Output prefix", metavar="character")) + make_option(c("-p", "--outprefix"), type="character", default='homer_annotation', help="Output prefix", metavar="string")) opt_parser <- OptionParser(option_list=option_list) opt <- parse_args(opt_parser) @@ -57,12 +57,21 @@ plot.feature.dat <- data.frame() for (idx in 1:length(HomerFiles)) { sampleid = SampleIDs[idx] - anno.dat <- read.table(HomerFiles[idx], sep="\t", header=TRUE,quote="") + anno.dat <- read.csv(HomerFiles[idx], sep="\t", header=TRUE) anno.dat <- anno.dat[,c("Annotation","Distance.to.TSS","Nearest.PromoterID")] - anno.dat <- anno.dat[which(!is.na(anno.dat$Distance.to.TSS)),] - if (nrow(anno.dat) == 0) { - quit(save = "no", status = 0, runLast = FALSE) - } + + ## REPLACE UNASSIGNED FEATURE ENTRIES WITH SENSIBLE VALUES + unassigned <- which(is.na(as.character(anno.dat$Distance.to.TSS))) + anno.dat$Distance.to.TSS[unassigned] <- 1000000 + + anno.dat$Annotation <- as.character(anno.dat$Annotation) + anno.dat$Annotation[unassigned] <- "Unassigned" + anno.dat$Annotation <- as.factor(anno.dat$Annotation) + + anno.dat$Nearest.PromoterID <- as.character(anno.dat$Nearest.PromoterID) + anno.dat$Nearest.PromoterID[unassigned] <- "Unassigned" + anno.dat$Nearest.PromoterID <- as.factor(anno.dat$Nearest.PromoterID) + anno.dat$name <- rep(sampleid,nrow(anno.dat)) anno.dat$Distance.to.TSS <- abs(anno.dat$Distance.to.TSS) + 1 plot.dat <- rbind(plot.dat,anno.dat) diff --git a/bin/plot_macs_qc.r b/bin/plot_macs_qc.r index 2360505f..b8e25d56 100755 --- a/bin/plot_macs_qc.r +++ b/bin/plot_macs_qc.r @@ -17,10 +17,10 @@ library(scales) ################################################ ################################################ -option_list <- list(make_option(c("-i", "--peak_files"), type="character", default=NULL, help="Comma-separated list of peak files.", metavar="peak_files"), - make_option(c("-s", "--sample_ids"), type="character", default=NULL, help="Comma-separated list of sample ids associated with peak files. Must be unique and in same order as peaks files input.", metavar="sampleids"), +option_list <- list(make_option(c("-i", "--peak_files"), type="character", default=NULL, help="Comma-separated list of peak files.", metavar="path"), + make_option(c("-s", "--sample_ids"), type="character", default=NULL, help="Comma-separated list of sample ids associated with peak files. Must be unique and in same order as peaks files input.", metavar="string"), make_option(c("-o", "--outdir"), type="character", default='./', help="Output directory", metavar="path"), - make_option(c("-p", "--outprefix"), type="character", default='macs2_peakqc', help="Output prefix", metavar="character")) + make_option(c("-p", "--outprefix"), type="character", default='macs2_peakqc', help="Output prefix", metavar="string")) opt_parser <- OptionParser(option_list=option_list) opt <- parse_args(opt_parser) diff --git a/bin/plot_peak_intersect.r b/bin/plot_peak_intersect.r index 2404b8b9..513e44b3 100755 --- a/bin/plot_peak_intersect.r +++ b/bin/plot_peak_intersect.r @@ -15,8 +15,8 @@ library(UpSetR) ################################################ ################################################ -option_list <- list(make_option(c("-i", "--input_file"), type="character", default=NULL, help="Path to tab-delimited file containing two columns i.e sample1&sample2&sample3 indicating intersect between samples set size.", metavar="input_file"), - make_option(c("-o", "--output_file"), type="character", default=NULL, help="Path to output file with '.pdf' extension.", metavar="output_file")) +option_list <- list(make_option(c("-i", "--input_file"), type="character", default=NULL, help="Path to tab-delimited file containing two columns i.e sample1&sample2&sample3 indicating intersect between samples set size.", metavar="path"), + make_option(c("-o", "--output_file"), type="character", default=NULL, help="Path to output file with '.pdf' extension.", metavar="path")) opt_parser <- OptionParser(option_list=option_list) opt <- parse_args(opt_parser) @@ -44,11 +44,13 @@ if (file.exists(OutDir) == FALSE) { comb.dat <- read.table(opt$input_file,sep="\t",header=FALSE) comb.vec <- comb.dat[,2] comb.vec <- setNames(comb.vec,comb.dat[,1]) - sets <- sort(unique(unlist(strsplit(names(comb.vec),split='&'))), decreasing = TRUE) + nintersects = length(names(comb.vec)) if (nintersects > 70) { nintersects <- 70 + comb.vec <- sort(comb.vec, decreasing = TRUE)[1:70] + sets <- sort(unique(unlist(strsplit(names(comb.vec),split='&'))), decreasing = TRUE) } pdf(opt$output_file,onefile=F,height=10,width=20) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index 0b76c7a6..585d1805 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -13,7 +13,7 @@ 'BEDTools': ['v_bedtools.txt', r"bedtools v(\S+)"], 'BamTools': ['v_bamtools.txt', r"bamtools (\S+)"], 'deepTools': ['v_deeptools.txt', r"plotFingerprint (\S+)"], - 'Picard': ['v_picard.txt', r"([\d\.]+)-SNAPSHOT"], + 'Picard': ['v_picard.txt', r"\n(\S+)"], 'R': ['v_R.txt', r"R version (\S+)"], 'Pysam': ['v_pysam.txt', r"(\S+)"], 'MACS2': ['v_macs2.txt', r"macs2 (\S+)"], @@ -54,9 +54,9 @@ results[k] = "v{}".format(match.group(1)) except IOError: results[k] = False - + # Remove software set to false in results -for k in results: +for k in list(results): if not results[k]: del(results[k]) diff --git a/conf/awsbatch.config b/conf/awsbatch.config deleted file mode 100755 index 14af5866..00000000 --- a/conf/awsbatch.config +++ /dev/null @@ -1,18 +0,0 @@ -/* - * ------------------------------------------------- - * Nextflow config file for running on AWS batch - * ------------------------------------------------- - * Base config needed for running with -profile awsbatch - */ -params { - config_profile_name = 'AWSBATCH' - config_profile_description = 'AWSBATCH Cloud Profile' - config_profile_contact = 'Alexander Peltzer (@apeltzer)' - config_profile_url = 'https://aws.amazon.com/de/batch/' -} - -aws.region = params.awsregion -process.executor = 'awsbatch' -process.queue = params.awsqueue -executor.awscli = '/home/ec2-user/miniconda/bin/aws' -params.tracedir = './' diff --git a/conf/base.config b/conf/base.config index f1a6373c..0fab975e 100755 --- a/conf/base.config +++ b/conf/base.config @@ -12,7 +12,7 @@ process { cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 7.GB * task.attempt, 'memory' ) } + memory = { check_max( 6.GB * task.attempt, 'memory' ) } time = { check_max( 4.h * task.attempt, 'time' ) } errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } @@ -22,22 +22,25 @@ process { // Process-specific resource requirements withLabel:process_low { cpus = { check_max( 2 * task.attempt, 'cpus' ) } - memory = { check_max( 14.GB * task.attempt, 'memory' ) } + memory = { check_max( 12.GB * task.attempt, 'memory' ) } time = { check_max( 6.h * task.attempt, 'time' ) } } withLabel:process_medium { cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 42.GB * task.attempt, 'memory' ) } + memory = { check_max( 36.GB * task.attempt, 'memory' ) } time = { check_max( 8.h * task.attempt, 'time' ) } } withLabel:process_high { cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 84.GB * task.attempt, 'memory' ) } + memory = { check_max( 72.GB * task.attempt, 'memory' ) } time = { check_max( 16.h * task.attempt, 'time' ) } } withLabel:process_long { time = { check_max( 20.h * task.attempt, 'time' ) } } + withLabel:error_ignore { + errorStrategy = 'ignore' + } withName:get_software_versions { cache = false } diff --git a/conf/igenomes.config b/conf/igenomes.config index 37217cf4..2de92422 100755 --- a/conf/igenomes.config +++ b/conf/igenomes.config @@ -18,6 +18,7 @@ params { bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt" mito_name = "MT" macs_gsize = "2.7e9" blacklist = "${baseDir}/assets/blacklists/GRCh37-blacklist.bed" @@ -42,6 +43,7 @@ params { bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt" mito_name = "MT" macs_gsize = "1.87e9" blacklist = "${baseDir}/assets/blacklists/GRCm38-blacklist.bed" @@ -54,6 +56,7 @@ params { bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt" mito_name = "Mt" } 'EB2' { @@ -64,6 +67,7 @@ params { bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt" } 'UMD3.1' { fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" @@ -73,6 +77,7 @@ params { bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt" mito_name = "MT" } 'WBcel235' { @@ -94,6 +99,7 @@ params { bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt" mito_name = "MT" } 'GRCz10' { @@ -125,6 +131,7 @@ params { bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt" mito_name = "MT" } 'EB1' { @@ -135,6 +142,7 @@ params { bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt" } 'Galgal4' { fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" @@ -154,6 +162,7 @@ params { bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt" } 'Mmul_1' { fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" @@ -163,6 +172,7 @@ params { bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt" mito_name = "MT" } 'IRGSP-1.0' { @@ -183,6 +193,7 @@ params { bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt" mito_name = "MT" } 'Rnor_6.0' { @@ -214,6 +225,7 @@ params { bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt" mito_name = "MT" macs_gsize = "1.21e7" } @@ -225,6 +237,7 @@ params { bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt" } 'Sscrofa10.2' { fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" @@ -234,6 +247,7 @@ params { bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt" mito_name = "MT" } 'AGPv3' { @@ -266,6 +280,7 @@ params { bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt" mito_name = "chrM" macs_gsize = "2.7e9" blacklist = "${baseDir}/assets/blacklists/hg19-blacklist.bed" @@ -278,6 +293,7 @@ params { bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt" mito_name = "chrM" macs_gsize = "1.87e9" blacklist = "${baseDir}/assets/blacklists/mm10-blacklist.bed" @@ -300,6 +316,7 @@ params { bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt" mito_name = "chrM" macs_gsize = "9e7" } @@ -311,6 +328,7 @@ params { bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt" mito_name = "chrM" } 'danRer10' { @@ -342,6 +360,7 @@ params { bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt" mito_name = "chrM" } 'galGal4' { @@ -352,6 +371,7 @@ params { bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt" mito_name = "chrM" } 'panTro4' { @@ -362,6 +382,7 @@ params { bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt" mito_name = "chrM" } 'rn6' { @@ -380,6 +401,7 @@ params { bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/" + readme = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt" mito_name = "chrM" macs_gsize = "1.2e7" } @@ -391,6 +413,7 @@ params { bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt" mito_name = "chrM" } } diff --git a/conf/test.config b/conf/test.config index 0435d184..ea900aca 100755 --- a/conf/test.config +++ b/conf/test.config @@ -4,13 +4,14 @@ * ------------------------------------------------- * Defines bundled input files and everything required * to run a fast and simple test. Use as follows: - * nextflow run nf-core/atacseq -profile test + * nextflow run nf-core/atacseq -profile test, */ params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' + // Limit resources so that this can run on GitHub Actions max_cpus = 2 max_memory = 6.GB max_time = 12.h diff --git a/conf/test_full.config b/conf/test_full.config new file mode 100755 index 00000000..480e040f --- /dev/null +++ b/conf/test_full.config @@ -0,0 +1,19 @@ +/* + * ------------------------------------------------- + * Nextflow config file for running tests + * ------------------------------------------------- + * Defines bundled input files and everything required + * to run a full pipeline test. Use as follows: + * nextflow run nf-core/atacseq -profile test_full, + */ + +params { + config_profile_name = 'Full test profile' + config_profile_description = 'Full test dataset to check pipeline function' + + // Input data + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/design_full.csv' + + // Genome references + genome = 'hg19' +} diff --git a/docs/images/mqc_annotatePeaks_feature_percentage_plot.png b/docs/images/mqc_annotatePeaks_feature_percentage_plot.png index 6aa81d4b..a14ba0c5 100755 Binary files a/docs/images/mqc_annotatePeaks_feature_percentage_plot.png and b/docs/images/mqc_annotatePeaks_feature_percentage_plot.png differ diff --git a/docs/images/mqc_cutadapt_plot.png b/docs/images/mqc_cutadapt_plot.png index 0b22e878..087eac3a 100755 Binary files a/docs/images/mqc_cutadapt_plot.png and b/docs/images/mqc_cutadapt_plot.png differ diff --git a/docs/images/mqc_deeptools_plotFingerprint_plot.png b/docs/images/mqc_deeptools_plotFingerprint_plot.png index e8ac54d0..6bc3a31b 100755 Binary files a/docs/images/mqc_deeptools_plotFingerprint_plot.png and b/docs/images/mqc_deeptools_plotFingerprint_plot.png differ diff --git a/docs/images/mqc_deeptools_plotProfile_plot.png b/docs/images/mqc_deeptools_plotProfile_plot.png index efeae182..812dffc5 100755 Binary files a/docs/images/mqc_deeptools_plotProfile_plot.png and b/docs/images/mqc_deeptools_plotProfile_plot.png differ diff --git a/docs/images/mqc_deseq2_pca_plot.png b/docs/images/mqc_deseq2_pca_plot.png index bb4720e6..bc573878 100755 Binary files a/docs/images/mqc_deseq2_pca_plot.png and b/docs/images/mqc_deseq2_pca_plot.png differ diff --git a/docs/images/mqc_deseq2_sample_similarity_plot.png b/docs/images/mqc_deseq2_sample_similarity_plot.png index f8de20eb..53861f70 100755 Binary files a/docs/images/mqc_deseq2_sample_similarity_plot.png and b/docs/images/mqc_deseq2_sample_similarity_plot.png differ diff --git a/docs/images/mqc_featureCounts_assignment_plot.png b/docs/images/mqc_featureCounts_assignment_plot.png index 84ae60ac..edfb57e7 100755 Binary files a/docs/images/mqc_featureCounts_assignment_plot.png and b/docs/images/mqc_featureCounts_assignment_plot.png differ diff --git a/docs/images/mqc_frip_score_plot.png b/docs/images/mqc_frip_score_plot.png index c1f510e2..52ea8ad8 100755 Binary files a/docs/images/mqc_frip_score_plot.png and b/docs/images/mqc_frip_score_plot.png differ diff --git a/docs/images/mqc_macs2_peak_count_plot.png b/docs/images/mqc_macs2_peak_count_plot.png index 48f7b483..40c46999 100755 Binary files a/docs/images/mqc_macs2_peak_count_plot.png and b/docs/images/mqc_macs2_peak_count_plot.png differ diff --git a/docs/images/mqc_picard_deduplication_plot.png b/docs/images/mqc_picard_deduplication_plot.png index f98de429..da6bdf06 100755 Binary files a/docs/images/mqc_picard_deduplication_plot.png and b/docs/images/mqc_picard_deduplication_plot.png differ diff --git a/docs/images/mqc_picard_insert_size_plot.png b/docs/images/mqc_picard_insert_size_plot.png index 1ae24bf2..41ebd218 100755 Binary files a/docs/images/mqc_picard_insert_size_plot.png and b/docs/images/mqc_picard_insert_size_plot.png differ diff --git a/docs/images/mqc_preseq_plot.png b/docs/images/mqc_preseq_plot.png index f40d18b0..c4c98f17 100755 Binary files a/docs/images/mqc_preseq_plot.png and b/docs/images/mqc_preseq_plot.png differ diff --git a/docs/images/mqc_samtools_idxstats_plot.png b/docs/images/mqc_samtools_idxstats_plot.png index 4b7a1e08..687f426d 100755 Binary files a/docs/images/mqc_samtools_idxstats_plot.png and b/docs/images/mqc_samtools_idxstats_plot.png differ diff --git a/docs/images/mqc_samtools_stats_plot.png b/docs/images/mqc_samtools_stats_plot.png index f30aa5e7..a7cac282 100755 Binary files a/docs/images/mqc_samtools_stats_plot.png and b/docs/images/mqc_samtools_stats_plot.png differ diff --git a/docs/images/nf-core-atacseq_logo.png b/docs/images/nf-core-atacseq_logo.png old mode 100755 new mode 100644 diff --git a/docs/images/r_deseq2_ma_plot.png b/docs/images/r_deseq2_ma_plot.png index a284eb2b..67c598fc 100755 Binary files a/docs/images/r_deseq2_ma_plot.png and b/docs/images/r_deseq2_ma_plot.png differ diff --git a/docs/images/r_deseq2_volcano_plot.png b/docs/images/r_deseq2_volcano_plot.png index d64d0666..679746b4 100755 Binary files a/docs/images/r_deseq2_volcano_plot.png and b/docs/images/r_deseq2_volcano_plot.png differ diff --git a/docs/output.md b/docs/output.md index fe6e82a9..7e89a9df 100755 --- a/docs/output.md +++ b/docs/output.md @@ -1,4 +1,4 @@ -# nf-core/atacseq: Output +# ![nf-core/atacseq](images/nf-core-atacseq_logo.png) This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. @@ -14,326 +14,323 @@ The directories listed below will be created in the output directory after the p The initial QC and alignments are performed at the library-level e.g. if the same library has been sequenced more than once to increase sequencing depth. This has the advantage of being able to assess each library individually, and the ability to process multiple libraries from the same sample in parallel. -1. **Raw read QC** +### Raw read QC - *Documentation*: - [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/) +
+ Output files - *Description*: - FastQC gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T). You get information about adapter contamination and other overrepresented sequences. +* `fastqc/` + * `*_fastqc.html`: FastQC report containing quality metrics for read 1 (*and read2 if paired-end*) **before** adapter trimming. +* `fastqc/zips/` + * `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images. - *Output directories*: - * `fastqc/` - FastQC `*.html` files for read 1 (*and read2 if paired-end*) **before** adapter trimming. - * `fastqc/zips/` - FastQC `*.zip` files for read 1 (*and read2 if paired-end*) **before** adapter trimming. +
-2. **Adapter trimming** +[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/) gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T). You get information about adapter contamination and other overrepresented sequences. - *Documentation*: - [Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) +### Adapter trimming - *Description*: - Trim Galore! is a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. By default, Trim Galore! will automatically detect and trim the appropriate adapter sequence. For most ATAC-seq datasets this will be the Nextera adapter sequence 'CTGTCTCTTATA'. See [`usage.md`](usage.md) for more details about the trimming options. +
+ Output files - ![MultiQC - Cutadapt trimmed sequence plot](images/mqc_cutadapt_plot.png) +* `trim_galore/` + * `*fastq.gz`: If `--save_trimmed` is specified, FastQ files **after** adapter trimming will be placed in this directory. +* `trim_galore/logs/` + * `*.log`: Log file generated by Trim Galore!. +* `trim_galore/fastqc/` + * `*_fastqc.html`: FastQC report containing quality metrics for read 1 (*and read2 if paired-end*) **after** adapter trimming. +* `trim_galore/fastqc/zips/` + * `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images. - *Output directories*: - * `trim_galore/` - If `--save_trimmed` is specified FastQ files **after** adapter trimming will be placed in this directory. - * `trim_galore/logs/` - `*.log` files generated by Trim Galore!. - * `trim_galore/fastqc/` - FastQC `*.html` files for read 1 (*and read2 if paired-end*) **after** adapter trimming. - * `trim_galore/fastqc/zips/` - FastQC `*.zip` files for read 1 (*and read2 if paired-end*) **after** adapter trimming. +
-3. **Alignment** +[Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) is a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. By default, Trim Galore! will automatically detect and trim the appropriate adapter sequence. For most ATAC-seq datasets this will be the Nextera adapter sequence 'CTGTCTCTTATA'. See [`usage.md`](usage.md) for more details about the trimming options. - *Documentation*: - [BWA](http://bio-bwa.sourceforge.net/bwa.shtml), [SAMtools](http://samtools.sourceforge.net/) +![MultiQC - Cutadapt trimmed sequence plot](images/mqc_cutadapt_plot.png) - *Description*: - Adapter-trimmed reads are mapped to the reference assembly using BWA. A genome index is required to run BWA so if this is not provided explicitly using the `--bwa_index` parameter then it will be created automatically from the genome fasta input. The index creation process can take a while for larger genomes so it is possible to use the `--save_reference` parameter to save the indices for future pipeline runs, reducing processing times. +### Alignment - ![MultiQC - SAMtools stats plot](images/mqc_samtools_stats_plot.png) +
+ Output files - ![MultiQC - SAMtools idxstats plot](images/mqc_samtools_idxstats_plot.png) +* `bwa/library/` + * `*.bam`: The files resulting from the alignment of individual libraries are not saved by default so this directory will not be present in your results. You can override this behaviour with the use of the `--save_align_intermeds` flag in which case it will contain the coordinate sorted alignment files in [`*.bam`](https://samtools.github.io/hts-specs/SAMv1.pdf) format. +* `bwa/library/samtools_stats/` + * SAMtools `*.flagstat`, `*.idxstats` and `*.stats` files generated from the alignment files. - File names in the resulting directory (i.e. `bwa/library/`) will have the '`.Lb.`' (**L**i**b**rary) suffix. +> **NB:** File names in the resulting directory (i.e. `bwa/library/`) will have the '`.Lb.`' suffix. - *Output directories*: - * `bwa/library/` - The files resulting from the alignment of individual libraries are not saved by default so this directory will not be present in your results. You can override this behaviour with the use of the `--saveAlignedIntermediates` flag in which case it will contain the coordinate sorted alignment files in [`*.bam`](https://samtools.github.io/hts-specs/SAMv1.pdf) format. - * `bwa/library/samtools_stats/` - SAMtools `*.flagstat`, `*.idxstats` and `*.stats` files generated from the alignment files. +
+ +Adapter-trimmed reads are mapped to the reference assembly using [BWA](http://bio-bwa.sourceforge.net/bwa.shtml). A genome index is required to run BWA so if this is not provided explicitly using the `--bwa_index` parameter then it will be created automatically from the genome fasta input. The index creation process can take a while for larger genomes so it is possible to use the `--save_reference` parameter to save the indices for future pipeline runs, reducing processing times. + +![MultiQC - SAMtools stats plot](images/mqc_samtools_stats_plot.png) + +![MultiQC - SAMtools idxstats plot](images/mqc_samtools_idxstats_plot.png) ## Merged library-level analysis The library-level alignments associated with the same sample are merged and subsequently used for the downstream analyses. -1. **Alignment merging, duplicate marking, filtering and QC** +### Alignment merging, duplicate marking, filtering and QC + +
+ Output files + +* `bwa/mergedLibrary/` + * `*.bam`: Merged library-level, coordinate sorted `*.bam` files after the marking of duplicates, and filtering based on various criteria. The file suffix for the final filtered files will be `*.mLb.clN.*`. If you specify the `--save_align_intermeds` parameter then two additional sets of files will be present. These represent the unfiltered alignments with duplicates marked (`*.mLb.mkD.*`), and in the case of paired-end datasets the filtered alignments before the removal of orphan read pairs (`*.mLb.flT.*`). +* `bwa/mergedLibrary/samtools_stats/` + * SAMtools `*.flagstat`, `*.idxstats` and `*.stats` files generated from the alignment files. +* `bwa/mergedLibrary/picard_metrics/` + * `*_metrics`: Alignment QC files from picard CollectMultipleMetrics. + * `*.metrics.txt`: Metrics file from MarkDuplicates. +* `bwa/mergedLibrary/picard_metrics/pdf/` + * `*.pdf`: Alignment QC plot files from picard CollectMultipleMetrics. +* `bwa/mergedLibrary/preseq/` + * `*.ccurve.txt`: Preseq expected future yield file. + +> **NB:** File names in the resulting directory (i.e. `bwa/mergedLibrary/`) will have the '`.mLb.`' suffix. - *Documentation*: - [picard](https://broadinstitute.github.io/picard/command-line-overview.html), [SAMtools](http://samtools.sourceforge.net/), [BEDTools](https://bedtools.readthedocs.io/en/latest/), [BAMTools](https://github.com/pezmaster31/bamtools/wiki/Tutorial_Toolkit_BamTools-1.0.pdf), [Pysam](https://pysam.readthedocs.io/en/latest/api.html), [Preseq](http://smithlabresearch.org/software/preseq/) +
- *Description*: - Picard MergeSamFiles and MarkDuplicates are used in combination to merge the alignments, and for the marking of duplicates, respectively. If you only have one library for any given replicate then the merging step isnt carried out because the library-level and merged library-level BAM files will be exactly the same. +[Picard MergeSamFiles and MarkDuplicates](https://broadinstitute.github.io/picard/command-line-overview.html) are used in combination to merge the alignments, and for the marking of duplicates, respectively. If you only have one library for any given replicate then the merging step isnt carried out because the library-level and merged library-level BAM files will be exactly the same. - Read duplicate marking is carried out using the Picard MarkDuplicates command. Duplicate reads are generally removed from the aligned reads to mitigate for fragments in the library that may have been sequenced more than once due to PCR biases. There is an option to keep duplicate reads with the `--keep_dups` parameter but its generally recommended to remove them to avoid the wrong interpretation of the results. A similar option has been provided to keep reads that are multi-mapped - `--keep_multi_map`. Other steps have been incorporated into the pipeline to filter the resulting alignments - see [`main README.md`](../README.md) for a more comprehensive listing, and the tools used at each step. +![MultiQC - Picard deduplication stats plot](images/mqc_picard_deduplication_plot.png) - Certain cell types and tissues yield an enormous fraction (typically 20–80%) of unusable sequences of mitochondrial origin. This is a known problem that is specific to ATAC-seq library preps - see [Montefiori et al. 2017](https://www.nature.com/articles/s41598-017-02547-w). There is an option to keep these reads using the `--keep_mito` parameter but its generally recommended to remove these in order to get a more reliable assessment of the duplication rate from the rest of the genome, and to avoid any biases in the downstream analyses. +Read duplicate marking is carried out using the Picard MarkDuplicates command. Duplicate reads are generally removed from the aligned reads to mitigate for fragments in the library that may have been sequenced more than once due to PCR biases. There is an option to keep duplicate reads with the `--keep_dups` parameter but its generally recommended to remove them to avoid the wrong interpretation of the results. A similar option has been provided to keep reads that are multi-mapped - `--keep_multi_map`. Other steps have been incorporated into the pipeline to filter the resulting alignments - see [`main README.md`](../README.md) for a more comprehensive listing, and the tools used at each step. A selection of alignment-based QC metrics generated by Picard CollectMultipleMetrics and MarkDuplicates will be included in the MultiQC report. - A selection of alignment-based QC metrics generated by Picard CollectMultipleMetrics and MarkDuplicates will be included in the MultiQC report. +![MultiQC - Picard insert size plot](images/mqc_picard_insert_size_plot.png) - ![MultiQC - Picard deduplication stats plot](images/mqc_picard_deduplication_plot.png) +Certain cell types and tissues yield an enormous fraction (typically 20–80%) of unusable sequences of mitochondrial origin. This is a known problem that is specific to ATAC-seq library preps - see [Montefiori et al. 2017](https://www.nature.com/articles/s41598-017-02547-w). There is an option to keep these reads using the `--keep_mito` parameter but its generally recommended to remove these in order to get a more reliable assessment of the duplication rate from the rest of the genome, and to avoid any biases in the downstream analyses. - ![MultiQC - Picard insert size plot](images/mqc_picard_insert_size_plot.png) +The [Preseq](http://smithlabresearch.org/software/preseq/) package is aimed at predicting and estimating the complexity of a genomic sequencing library, equivalent to predicting and estimating the number of redundant reads from a given sequencing depth and how many will be expected from additional sequencing using an initial sequencing experiment. The estimates can then be used to examine the utility of further sequencing, optimize the sequencing depth, or to screen multiple libraries to avoid low complexity samples. The dashed line shows a perfectly complex library where total reads = unique reads. Note that these are predictive numbers only, not absolute. The MultiQC plot can sometimes give extreme sequencing depth on the X axis - click and drag from the left side of the plot to zoom in on more realistic numbers. - The Preseq package is aimed at predicting and estimating the complexity of a genomic sequencing library, equivalent to predicting and estimating the number of redundant reads from a given sequencing depth and how many will be expected from additional sequencing using an initial sequencing experiment. The estimates can then be used to examine the utility of further sequencing, optimize the sequencing depth, or to screen multiple libraries to avoid low complexity samples. The dashed line shows a perfectly complex library where total reads = unique reads. +![MultiQC - Preseq library complexity plot](images/mqc_preseq_plot.png) - Note that these are predictive numbers only, not absolute. The MultiQC plot can sometimes give extreme sequencing depth on the X axis - click and drag from the left side of the plot to zoom in on more realistic numbers. +### Normalised bigWig files - ![MultiQC - Preseq library complexity plot](images/mqc_preseq_plot.png) +
+ Output files - File names in the resulting directory (i.e. `bwa/mergedLibrary/`) will have the '`.mLb.`' suffix to denote **m**erged **L**i**b**raries. +* `bwa/mergedLibrary/bigwig/` + * `*.bigWig`: Normalised bigWig files scaled to 1 million mapped reads. - *Output directories*: - * `bwa/mergedLibrary/` - Merged library-level, coordinate sorted `*.bam` files after the marking of duplicates, and filtering based on various criteria. The file suffix for the final filtered files will be `*.mLb.clN.*`. If you specify the `--saveAlignedIntermediates` parameter then two additional sets of files will be present. These represent the unfiltered alignments with duplicates marked (`*.mLb.mkD.*`), and in the case of paired-end datasets the filtered alignments before the removal of orphan read pairs (`*.mLb.flT.*`). - * `bwa/mergedLibrary/samtools_stats/` - SAMtools `*.flagstat`, `*.idxstats` and `*.stats` files generated from the alignment files. - * `bwa/mergedLibrary/picard_metrics/` - Alignment QC files from picard CollectMultipleMetrics and the metrics file from MarkDuplicates: `*_metrics` and `*.metrics.txt`, respectively. - * `bwa/mergedLibrary/picard_metrics/pdf/` - Alignment QC plot files in `*.pdf` format from picard CollectMultipleMetrics. - * `bwa/mergedLibrary/preseq/` - Preseq expected future yield file (`*.ccurve.txt`). +
-2. **Normalised bigWig files** +The [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format is in an indexed binary format useful for displaying dense, continuous data in Genome Browsers such as the [UCSC](https://genome.ucsc.edu/cgi-bin/hgTracks) and [IGV](http://software.broadinstitute.org/software/igv/). This mitigates the need to load the much larger BAM files for data visualisation purposes which will be slower and result in memory issues. The coverage values represented in the bigWig file can also be normalised in order to be able to compare the coverage across multiple samples - this is not possible with BAM files. The bigWig format is also supported by various bioinformatics software for downstream processing such as meta-profile plotting. - *Documentation*: - [BEDTools](https://bedtools.readthedocs.io/en/latest/), [bedGraphToBigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html#Ex3) +### Coverage QC - *Description*: - The [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format is in an indexed binary format useful for displaying dense, continuous data in Genome Browsers such as the [UCSC](https://genome.ucsc.edu/cgi-bin/hgTracks) and [IGV](http://software.broadinstitute.org/software/igv/). This mitigates the need to load the much larger BAM files for data visualisation purposes which will be slower and result in memory issues. The coverage values represented in the bigWig file can also be normalised in order to be able to compare the coverage across multiple samples - this is not possible with BAM files. The bigWig format is also supported by various bioinformatics software for downstream processing such as meta-profile plotting. +
+ Output files - *Output directories*: - * `bwa/mergedLibrary/bigwig/` - Normalised `*.bigWig` files scaled to 1 million mapped reads. +* `bwa/mergedLibrary/deepTools/plotFingerprint/` + * `*.plotFingerprint.pdf`, `*.plotFingerprint.qcmetrics.txt`, `*.plotFingerprint.raw.txt`: plotFingerprint output files. +* `bwa/mergedLibrary/deepTools/plotProfile/` + * `*.computeMatrix.mat.gz`, `*.computeMatrix.vals.mat.tab`, `*.plotProfile.pdf`, `*.plotProfile.tab`, `*.plotHeatmap.pdf`, `*.plotHeatmap.mat.tab`: plotProfile output files. -3. **Coverage QC** +
- *Documentation*: - [deepTools](https://deeptools.readthedocs.io/en/develop/content/list_of_tools.html) +[deepTools](https://deeptools.readthedocs.io/en/develop/content/list_of_tools.html) plotFingerprint is a useful QC for ATAC-seq data in order to see the relative enrichment of the samples in the experiment on a genome-wide basis (see [plotFingerprint docs](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html)). - *Description*: - deepTools plotFingerprint is a useful QC for ATAC-seq data in order to see the relative enrichment of the samples in the experiment on a genome-wide basis (see [plotFingerprint docs](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html)). +![MultiQC - deepTools plotFingerprint plot](images/mqc_deeptools_plotFingerprint_plot.png) - ![MultiQC - deepTools plotFingerprint plot](images/mqc_deeptools_plotFingerprint_plot.png) +The results from deepTools plotProfile gives you a quick visualisation for the genome-wide enrichment of your samples at the TSS, and across the gene body. During the downstream analysis, you may want to refine the features/genes used to generate these plots in order to see a more specific condition-related effect. - The results from deepTools plotProfile gives you a quick visualisation for the genome-wide enrichment of your samples at the TSS, and across the gene body. During the downstream analysis, you may want to refine the features/genes used to generate these plots in order to see a more specific condition-related effect. +![MultiQC - deepTools plotProfile plot](images/mqc_deeptools_plotProfile_plot.png) - ![MultiQC - deepTools plotProfile plot](images/mqc_deeptools_plotProfile_plot.png) +### Call peaks - *Output directories*: - * `bwa/mergedLibrary/deepTools/plotFingerprint/` - * Output files: `*.plotFingerprint.pdf`, `*.plotFingerprint.qcmetrics.txt`, `*.plotFingerprint.raw.txt` - * `bwa/mergedLibrary/deepTools/plotProfile/` - * Output files: `*.computeMatrix.mat.gz`, `*.computeMatrix.vals.mat.gz`, `*.plotProfile.pdf`, `*.plotProfile.tab`. +
+ Output files -4. **Call peaks** +* `bwa/mergedLibrary/macs//` + * `*.xls`, `*.broadPeak` or `*.narrowPeak`, `*.gappedPeak`, `*summits.bed`: MACS2 output files - the files generated will depend on whether MACS2 has been run in *narrowPeak* or *broadPeak* mode. + * `*.annotatePeaks.txt`: HOMER peak-to-gene annotation file. +* `bwa/mergedLibrary/macs//qc/` + * `macs_peak.plots.pdf`: QC plots for MACS2 peaks. + * `macs_annotatePeaks.plots.pdf`: QC plots for peak-to-gene feature annotation. + * `*.FRiP_mqc.tsv`, `*.count_mqc.tsv`, `macs_annotatePeaks.summary_mqc.tsv`: MultiQC custom-content files for FRiP score, peak count and peak-to-gene ratios. - *Documentation*: - [MACS2](https://github.com/taoliu/MACS), [HOMER](http://homer.ucsd.edu/homer/ngs/annotation.html) +> **NB:** `` in the directory structure above corresponds to the type of peak that you have specified to call with MACS2 i.e. `broadPeak` or `narrowPeak`. If you so wish, you can call both narrow and broad peaks without redoing the preceding steps in the pipeline such as the alignment and filtering. For example, if you already have broad peaks then just add `--narrow_peak -resume` to the command you used to run the pipeline, and these will be called too! However, resuming the pipeline will only be possible if you have not deleted the `work/` directory generated by the pipeline. - *Description*: - MACS2 is one of the most popular peak-calling algorithms for ChIP-seq data. For ATAC-seq data we are also looking for genome-wide regions of enrichment but in this case without comparison to a standard control sample (e.g. input DNA). By default, the peaks are called with the MACS2 `--broad` parameter. If, however, you would like to call narrow peaks then please provide the `--narrow_peak` parameter when running the pipeline. See [MACS2 outputs](https://github.com/taoliu/MACS#output-files) for a description of the output files generated by MACS2. +
- ![MultiQC - MACS2 total peak count plot](images/mqc_macs2_peak_count_plot.png) +[MACS2](https://github.com/taoliu/MACS) is one of the most popular peak-calling algorithms for ChIP-seq data. For ATAC-seq data we are also looking for genome-wide regions of enrichment but in this case without comparison to a standard control sample (e.g. input DNA). By default, the peaks are called with the MACS2 `--broad` parameter. If, however, you would like to call narrow peaks then please provide the `--narrow_peak` parameter when running the pipeline. See [MACS2 outputs](https://github.com/taoliu/MACS#output-files) for a description of the output files generated by MACS2. - [HOMER annotatePeaks.pl](http://homer.ucsd.edu/homer/ngs/annotation.html) is used to annotate the peaks relative to known genomic features. HOMER is able to use the `--gtf` annotation file which is provided to the pipeline. Please note that some of the output columns will be blank because the annotation is not provided using HOMER's in-built database format. However, the more important fields required for downstream analysis will be populated i.e. *Annotation*, *Distance to TSS* and *Nearest Promoter ID*. +![MultiQC - MACS2 total peak count plot](images/mqc_macs2_peak_count_plot.png) - ![MultiQC - HOMER annotatePeaks peak-to-gene feature ratio plot](images/mqc_annotatePeaks_feature_percentage_plot.png) +[HOMER annotatePeaks.pl](http://homer.ucsd.edu/homer/ngs/annotation.html) is used to annotate the peaks relative to known genomic features. HOMER is able to use the `--gtf` annotation file which is provided to the pipeline. Please note that some of the output columns will be blank because the annotation is not provided using HOMER's in-built database format. However, the more important fields required for downstream analysis will be populated i.e. *Annotation*, *Distance to TSS* and *Nearest Promoter ID*. - Various QC plots per sample including number of peaks, fold-change distribution, [FRiP score](https://genome.cshlp.org/content/22/9/1813.full.pdf+html) and peak-to-gene feature annotation are also generated by the pipeline. Where possible these have been integrated into the MultiQC report. +![MultiQC - HOMER annotatePeaks peak-to-gene feature ratio plot](images/mqc_annotatePeaks_feature_percentage_plot.png) - ![MultiQC - MACS2 peaks FRiP score plot](images/mqc_frip_score_plot.png) +Various QC plots per sample including number of peaks, fold-change distribution, [FRiP score](https://genome.cshlp.org/content/22/9/1813.full.pdf+html) and peak-to-gene feature annotation are also generated by the pipeline. Where possible these have been integrated into the MultiQC report. - `` in the directory structure below corresponds to the type of peak that you have specified to call with MACS2 i.e. `broadPeak` or `narrowPeak`. If you so wish, you can call both narrow and broad peaks without redoing the preceding steps in the pipeline such as the alignment and filtering. For example, if you already have broad peaks then just add `--narrow_peak -resume` to the command you used to run the pipeline, and these will be called too! However, resuming the pipeline will only be possible if you have not deleted the `work/` directory generated by the pipeline. +![MultiQC - MACS2 peaks FRiP score plot](images/mqc_frip_score_plot.png) - *Output directories*: - * `bwa/mergedLibrary/macs//` - * MACS2 output files: `*.xls`, `*.broadPeak` or `*.narrowPeak`, `*.gappedPeak` and `*summits.bed`. - The files generated will depend on whether MACS2 has been run in *narrowPeak* or *broadPeak* mode. - * HOMER peak-to-gene annotation file: `*.annotatePeaks.txt`. - * `bwa/mergedLibrary/macs//qc/` - * QC plots for MACS2 peaks: `macs_peak.plots.pdf` - * QC plots for peak-to-gene feature annotation: `macs_annotatePeaks.plots.pdf` - * MultiQC custom-content files for FRiP score, peak count and peak-to-gene ratios: `*.FRiP_mqc.tsv`, `*.count_mqc.tsv` and `macs_annotatePeaks.summary_mqc.tsv` respectively. +### Create and quantify consensus set of peaks -5. **Create consensus set of peaks** +
+ Output files - *Documentation*: - [BEDTools](https://bedtools.readthedocs.io/en/latest/) +* `bwa/mergedLibrary/macs//consensus/` + * `*.bed`: Consensus peak-set across all samples in BED format. + * `*.saf`: Consensus peak-set across all samples in SAF format. Required by featureCounts for read quantification. + * `*.featureCounts.txt`: Read counts across all samples relative to consensus peak-set. + * `*.annotatePeaks.txt`: HOMER peak-to-gene annotation file for consensus peaks. + * `*.boolean.annotatePeaks.txt`: Spreadsheet representation of consensus peak-set across samples **with** gene annotation columns. The columns from individual peak files are included in this file along with the ability to filter peaks based on their presence or absence in multiple replicates/conditions. + * `*.boolean.txt`: Spreadsheet representation of consensus peak-set across samples **without** gene annotation columns. Same as file above but without annotation columns. + * `*.boolean.intersect.plot.pdf`, `*.boolean.intersect.txt`: [UpSetR](https://cran.r-project.org/web/packages/UpSetR/README.html) files to illustrate peak intersection. - *Description*: - In order to perform the differential binding analysis we need to be able to carry out the read quantification for the same intervals across **all** of the samples in the experiment. To this end, the individual peak-sets called per sample have to be merged together in order to create a consensus set of peaks. +
- Using the consensus peaks it is possible to assess the degree of overlap between the peaks from a set of samples e.g. *Which consensus peaks contain peaks that are common/unique to a given set of samples?*. This may be useful for downstream filtering of peaks based on whether they are called in multiple replicates/conditions. Please note that it is possible for a consensus peak to contain multiple peaks from the same sample. Unfortunately, this is sample-dependent but the files generated by the pipeline do have columns that report such instances and allow you to factor them into any further analysis. +In order to perform the differential accessibility analysis we need to be able to carry out the read quantification for the same intervals across **all** of the samples in the experiment. To this end, the individual peak-sets called per sample have to be merged together in order to create a consensus set of peaks. - By default, the peak-sets are not filtered, therefore, the consensus peaks will be generated across the union set of peaks from all samples. However, you can increment the `--min_reps_consensus` parameter appropriately if you are confident you have good reproducibility amongst your replicates to create a "reproducible" set of consensus of peaks. In future iterations of the pipeline more formal analyses such as [IDR](https://projecteuclid.org/euclid.aoas/1318514284) may be implemented to obtain reproducible and high confidence peak-sets with which to perform this sort of analysis. +Using the consensus peaks it is possible to assess the degree of overlap between the peaks from a set of samples e.g. *Which consensus peaks contain peaks that are common/unique to a given set of samples?*. This may be useful for downstream filtering of peaks based on whether they are called in multiple replicates/conditions. Please note that it is possible for a consensus peak to contain multiple peaks from the same sample. Unfortunately, this is sample-dependent but the files generated by the pipeline do have columns that report such instances and allow you to factor them into any further analysis. - ![R - UpSetR peak intersection plot](images/r_upsetr_intersect_plot.png) +![R - UpSetR peak intersection plot](images/r_upsetr_intersect_plot.png) - *Output directories*: - * `bwa/mergedLibrary/macs//consensus/` - * Consensus peak-set across all samples in `*.bed` format. - * Consensus peak-set across all samples in `*.saf` format. Required by featureCounts for read quantification. - * HOMER `*.annotatePeaks.txt` peak-to-gene annotation file for consensus peaks. - * Spreadsheet representation of consensus peak-set across samples **with** gene annotation columns: `*.boolean.annotatePeaks.txt`. - The columns from individual peak files are included in this file along with the ability to filter peaks based on their presence or absence in multiple replicates/conditions. - * Spreadsheet representation of consensus peak-set across samples **without** gene annotation columns: `*.boolean.txt`. - Same as file above but without annotation columns. - * [UpSetR](https://cran.r-project.org/web/packages/UpSetR/README.html) files to illustrate peak intersection: `*.boolean.intersect.plot.pdf` and `*.boolean.intersect.txt`. +By default, the peak-sets are not filtered, therefore, the consensus peaks will be generated across the union set of peaks from all samples. However, you can increment the `--min_reps_consensus` parameter appropriately if you are confident you have good reproducibility amongst your replicates to create a "reproducible" set of consensus of peaks. In future iterations of the pipeline more formal analyses such as [IDR](https://projecteuclid.org/euclid.aoas/1318514284) may be implemented to obtain reproducible and high confidence peak-sets with which to perform this sort of analysis. -6. **Read counting and differential binding analysis** +The [featureCounts](http://bioinf.wehi.edu.au/featureCounts/) tool is used to count the number of reads relative to the consensus peak-set across all of the samples. This essentially generates a file containing a matrix where the rows represent the consensus intervals, the columns represent all of the samples in the experiment, and the values represent the raw read counts. - *Documentation*: - [featureCounts](http://bioinf.wehi.edu.au/featureCounts/), [DESeq2](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html), [R](https://www.r-project.org/) +![MultiQC - featureCounts consensus peak read assignment plot](images/mqc_featureCounts_assignment_plot.png) - *Description*: - The featureCounts tool is used to count the number of reads relative to the consensus peak-set across all of the samples. This essentially generates a file containing a matrix where the rows represent the consensus intervals, the columns represent all of the samples in the experiment, and the values represent the raw read counts. +### Read counting and differential accessibility analysis - ![MultiQC - featureCounts consensus peak read assignment plot](images/mqc_featureCounts_assignment_plot.png) +
+ Output files - DESeq2 is more commonly used to perform differential expression analysis for RNA-seq datasets. However, it can also be used for ATAC-seq differential accessibility analysis, in which case you can imagine that instead of counts per gene for RNA-seq data we now have counts per accessible region. +* `bwa/mergedLibrary/macs//consensus/deseq2/` + * `*.results.txt`: Spreadsheet containing differential accessibility results across all consensus peaks and all comparisons. + * `*.plots.pdf`: File containing PCA and hierarchical clustering plots. + * `*.log`: Log file with information for number of differentially accessible intervals at different FDR and fold-change thresholds for each comparison. + * `*.dds.rld.RData`: File containing R `dds` and `rld` objects generated by DESeq2. + * `R_sessionInfo.log`: File containing information about R, the OS and attached or loaded packages. +* `bwa/mergedLibrary/macs//consensus//` + * `*.results.txt`: Spreadsheet containing comparison-specific DESeq2 output for differential accessibility results across all peaks. + * `*FDR0.01.results.txt`, `*FDR0.05.results.txt`: Subset of above file for peaks that pass FDR <= 0.01 and FDR <= 0.05. + * `*FDR0.01.results.bed`, `*FDR0.05.results.bed`: BED files for peaks that pass FDR <= 0.01 and FDR <= 0.05. + * `*deseq2.plots.pdf`: MA, Volcano, clustering and scatterplots at FDR <= 0.01 and FDR <= 0.05. +* `bwa/mergedLibrary/macs//consensus/sizeFactors/` + * `*.txt`, `*.RData`: Files containing DESeq2 sizeFactors per sample. - This pipeline uses a standardised DESeq2 analysis script to get an idea of the reproducibility within the experiment, and to assess the overall differential binding. Please note that this will not suit every experimental design, and if there are other problems with the experiment then it may not work as well as expected. +
- ![MultiQC - DESeq2 PCA plot](images/mqc_deseq2_pca_plot.png) +[DESeq2](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) is more commonly used to perform differential expression analysis for RNA-seq datasets. However, it can also be used for ATAC-seq differential accessibility analysis, in which case you can imagine that instead of counts per gene for RNA-seq data we now have counts per accessible region. - ![MultiQC - DESeq2 sample similarity plot](images/mqc_deseq2_sample_similarity_plot.png) +This pipeline uses a standardised DESeq2 analysis script to get an idea of the reproducibility within the experiment, and to assess the overall differential accessibility. Please note that this will not suit every experimental design, and if there are other problems with the experiment then it may not work as well as expected. For larger experiments, it may be recommended to use the `vst` transformation instead of the default `rlog` option. You can do this by providing the `--deseq2_vst` parameter to the pipeline. See [DESeq2 docs](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization) for a more detailed explanation. - By default, all possible pairwise comparisons across the groups from a particular antibody (as defined in [`design.csv`](usage.md#--design)) are performed. The DESeq2 results are generated by the pipeline in various ways. You can load up the results across all of the comparisons in a single spreadsheet, or individual folders will also be created that contain the results specific to a particular comparison. For the latter, additional files will also be generated where the intervals have been pre-filtered based on a couple of standard FDR thresholds. Please see [DESeq2 output](http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis) for a description of the columns generated by DESeq2. +![MultiQC - DESeq2 PCA plot](images/mqc_deseq2_pca_plot.png) - ![R - DESeq2 MA plot](images/r_deseq2_ma_plot.png) +

+ MultiQC - DESeq2 sample similarity plot +

- ![R - DESeq2 Volcano plot](images/r_deseq2_volcano_plot.png) +By default, all possible pairwise comparisons across the groups are performed (as defined in [`design.csv`](usage.md#--design)). The DESeq2 results are generated by the pipeline in various ways. You can load up the results across all of the comparisons in a single spreadsheet, or individual folders will also be created that contain the results specific to a particular comparison. For the latter, additional files will also be generated where the intervals have been pre-filtered based on a couple of standard FDR thresholds. Please see [DESeq2 output](http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis) for a description of the columns generated by DESeq2. - *Output directories*: - * `bwa/mergedLibrary/macs//consensus//deseq2/` - * `.featureCounts.txt` file for read counts across all samples relative to consensus peak-set. - * Differential binding `*.results.txt` spreadsheet containing results across all consensus peaks and all comparisons. - * `*.plots.pdf` file for PCA and hierarchical clustering. - * `*.log` file with information for number of differentially bound intervals at different FDR and fold-change thresholds for each comparison. - * `*.dds.rld.RData` file containing R `dds` and `rld` objects generated by DESeq2. - * `R_sessionInfo.log` file containing information about R, the OS and attached or loaded packages. - * `bwa/mergedLibrary/macs//consensus///` - * `*.results.txt` spreadsheet containing comparison-specific DESeq2 output for differential binding results across all peaks. - * Subset of above file for peaks that pass FDR <= 0.01 (`*FDR0.01.results.txt`) and FDR <= 0.05 (`*FDR0.05.results.txt`). - * BED files for peaks that pass FDR <= 0.01 (`*FDR0.01.results.bed`) and FDR <= 0.05 (`*FDR0.05.results.bed`). - * MA, Volcano, clustering and scatterplots at FDR <= 0.01 and FDR <= 0.05: `*deseq2.plots.pdf`. - * `bwa/mergedLibrary/macs//consensus//sizeFactors/` - Files containing DESeq2 sizeFactors per sample: `*.txt` and `*.RData`. +

+ R - DESeq2 MA plot +

-7. **ataqv** +

+ R - DESeq2 Volcano plot +

- *Software*: - [ataqv](https://parkerlab.github.io/ataqv/) +### ataqv - *Description*: - ataqv is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing. +
+ Output files - Please see [ataqv homepage](https://parkerlab.github.io/ataqv/) for documentation and an example report. +* `bwa/mergedLibrary/ataqv//` + * `.json`: JSON files containing ATAC-seq specific metrics for each sample. +* `bwa/mergedLibrary/ataqv//html/` + * Folder containing ataqv results aggregated across all samples for visualisation via an internet browser. - *Output directories*: - * `bwa/mergedLibrary/ataqv//` - `.json` files containing ATAC-seq specific metrics for each sample. - * `bwa/mergedLibrary/ataqv//html/` - Folder containing ataqv results aggregated across all samples for visualisation via an internet browser. +
+ +[ataqv](https://parkerlab.github.io/ataqv/) is a toolkit for measuring and comparing ATAC-seq results. It was written to help understand how well ATAC-seq assays have worked, and to make it easier to spot differences that might be caused by library prep or sequencing. Please see [ataqv homepage](https://parkerlab.github.io/ataqv/) for documentation and an example report. ## Merged replicate-level analysis -The alignments associated with all of the replicates from the same experimental condition can also be merged. This can be useful to increase the coverage for peak-calling and for other analyses that require high sequencing depth such as [motif footprinting](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3959825/). The analysis steps and directory structure for `bwa/mergedLibrary/` and `bwa/mergedReplicate/` are almost identical. +The alignments associated with all of the replicates from the same experimental condition can also be merged. This can be useful to increase the coverage for peak-calling and for other analyses that require high sequencing depth such as [motif footprinting](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3959825/). The analysis steps and directory structure for `bwa/mergedLibrary/` and `bwa/mergedReplicate/` are almost identical. -File names in the resulting directory (i.e. `bwa/mergedReplicate/`) will have the '`.mRp.`' suffix to denote **m**erged **R**e**p**licates. +File names in the resulting directory (i.e. `bwa/mergedReplicate/`) will have the '`.mRp.`' suffix. -You can skip this portion of the analysis by specifying the `--skipMergeReplicate` parameter. +You can skip this portion of the analysis by specifying the `--skip_merge_replicates` parameter. ->NB: Merged library-level alignments will be used for read counting relative to the consensus merged replicate-level peakset. This is the only way in which differential analysis can be performed at the merged replicate-level. +> **NB:** Merged library-level alignments will be used for read counting relative to the consensus merged replicate-level peakset. This is the only way in which differential analysis can be performed at the merged replicate-level. ## Aggregate analysis -1. **Present QC for the raw read, alignment, peak and differential accessibility results** +### Present QC for the raw read, alignment, peak and differential accessibility results + +
+ Output files + +* `multiqc//` + * `multiqc_report.html`: A standalone HTML file that can be viewed in your web browser. + * `multiqc_data/`: Directory containing parsed statistics from the different tools used in the pipeline. + * `multiqc_plots/`: Directory containing static images from the report in various formats. - *Documentation*: - [MultiQC](https://multiqc.info/docs/) +
- *Description*: - MultiQC is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available within the report data directory. +[MultiQC](https://multiqc.info/docs/) is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available within the report data directory. - Results generated by MultiQC collate pipeline QC from FastQC, TrimGalore, samtools flagstat, samtools idxstats, samtools stats, picard CollectMultipleMetrics, picard MarkDuplicates, Preseq, deepTools plotProfile, deepTools plotFingerprint and featureCounts. The default [`multiqc config file`](../assets/multiqc/multiqc_config.yaml) also contains the provision for loading custom-content to report peak counts, FRiP scores, peak-to-gene annnotation proportions, sample-similarity heatmaps and PCA plots. +Results generated by MultiQC collate pipeline QC from FastQC, TrimGalore, samtools flagstat, samtools idxstats, samtools stats, picard CollectMultipleMetrics, picard MarkDuplicates, Preseq, deepTools plotProfile, deepTools plotFingerprint and featureCounts. The default [`multiqc config file`](../assets/multiqc_config.yaml) also contains the provision for loading custom-content to report peak counts, FRiP scores, peak-to-gene annnotation proportions, sample-similarity heatmaps and PCA plots. - The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see . +The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see . - *Output directories*: - * `multiqc//` - * `multiqc_report.html` - a standalone HTML file that can be viewed in your web browser. - * `multiqc_data/` - directory containing parsed statistics from the different tools used in the pipeline. - * `multiqc_plots/` - directory containing static images from the report in various formats. +### Create IGV session file -2. **Create IGV session file** +
+ Output files - *Documentation*: - [IGV](https://software.broadinstitute.org/software/igv/UserGuide) +* `igv//` + * `igv_session.xml`: Session file that can be directly loaded into IGV. + * `igv_files.txt`: File containing a listing of the files used to create the IGV session. - *Description*: - An IGV session file will be created at the end of the pipeline containing the normalised bigWig tracks, per-sample peaks, consensus peaks and differential sites. This avoids having to load all of the data individually into IGV for visualisation. +
- The genome fasta file required for the IGV session will be the same as the one that was provided to the pipeline. This will be copied into `reference_genome/` to overcome any loading issues. If you prefer to use another path or an in-built genome provided by IGV just change the `genome` entry in the second-line of the session file. +An [IGV](https://software.broadinstitute.org/software/igv/UserGuide) session file will be created at the end of the pipeline containing the normalised bigWig tracks, per-sample peaks, consensus peaks and differential sites. This avoids having to load all of the data individually into IGV for visualisation. - The file paths in the IGV session file will only work if the results are kept in the same place on your storage. If the results are moved or for example, if you prefer to load the data over the web then just replace the file paths with others that are more appropriate. +The genome fasta file required for the IGV session will be the same as the one that was provided to the pipeline. This will be copied into `genome/` to overcome any loading issues. If you prefer to use another path or an in-built genome provided by IGV just change the `genome` entry in the second-line of the session file. - Once installed, open IGV, go to `File > Open Session` and select the `igv_session.xml` file for loading. +The file paths in the IGV session file will only work if the results are kept in the same place on your storage. If the results are moved or for example, if you prefer to load the data over the web then just replace the file paths with others that are more appropriate. - >NB: If you are not using an in-built genome provided by IGV you will need to load the annotation yourself e.g. in .gtf and/or .bed format. +Once installed, open IGV, go to `File > Open Session` and select the `igv_session.xml` file for loading. - ![IGV screenshot](images/igv_screenshot.png) +![IGV screenshot](images/igv_screenshot.png) - *Output directories*: - * `igv//` - * `igv_session.xml` file. - * `igv_files.txt` file containing a listing of the files used to create the IGV session, and their allocated colours. +> **NB:** If you are not using an in-built genome provided by IGV you will need to load the annotation yourself e.g. in .gtf and/or .bed format. ## Other results -1. **Reference genome files** +### Reference genome files + +
+ Output files + +* `genome/` + * A number of genome-specific files are generated by the pipeline in order to aid in the filtering of the data, and because they are required by standard tools such as BEDTools. These can be found in this directory along with the genome fasta file which is required by IGV. If using a genome from AWS iGenomes and if it exists a `README.txt` file containing information about the annotation version will also be saved in this directory. +* `genome/BWAIndex/` + * If the `--save_reference` parameter is provided then the alignment indices generated by the pipeline will be saved in this directory. This can be quite a time-consuming process so it permits their reuse for future runs of the pipeline or for other purposes. - *Documentation*: - [BWA](https://sourceforge.net/projects/bio-bwa/files/), [BEDTools](https://bedtools.readthedocs.io/en/latest/), [SAMtools](http://samtools.sourceforge.net/) +
- *Description*: - Reference genome-specific files can be useful to keep for the downstream processing of the results. +Reference genome-specific files can be useful to keep for the downstream processing of the results. - *Output directories*: - * `reference_genome/` - A number of genome-specific files are generated by the pipeline in order to aid in the filtering of the data, and because they are required by standard tools such as BEDTools. These can be found in this directory along with the genome fasta file which is required by IGV. - * `reference_genome/BWAIndex/` - If the `--save_reference` parameter is provided then the alignment indices generated by the pipeline will be saved in this directory. This can be quite a time-consuming process so it permits their reuse for future runs of the pipeline or for other purposes. +### Pipeline information -2. **Pipeline information** +
+ Output files - *Documentation*: - [Nextflow!](https://www.nextflow.io/docs/latest/tracing.html) +* `pipeline_info/` + * `pipeline_report.html`, `pipeline_report.txt`, `software_versions.csv`: Reports generated by the pipeline. + * `execution_report.html`, `execution_timeline.html`, `execution_trace.txt`, `pipeline_dag.svg`: Reports generated by Nextflow. + * `design_reads.csv`: Reformatted design files used as input to the pipeline. + * `results_description.html`: Documentation for interpretation of results in HTML format. - *Description*: - Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to trouble-shoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. +
- *Output directories*: - * `pipeline_info/` - * Reports generated by the pipeline - `pipeline_report.html`, `pipeline_report.txt` and `software_versions.csv`. - * Reports generated by Nextflow - `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.svg`. - * Reformatted design files used as input to the pipeline - `design_reads.csv` and `design_controls.csv`. - * `Documentation/` - Documentation for interpretation of results in HTML format - `results_description.html`. +[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to trouble-shoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. diff --git a/docs/usage.md b/docs/usage.md index 704fa679..247b52a0 100755 --- a/docs/usage.md +++ b/docs/usage.md @@ -2,72 +2,79 @@ ## Table of contents - - * [Table of contents](#table-of-contents) * [Introduction](#introduction) * [Running the pipeline](#running-the-pipeline) - * [Updating the pipeline](#updating-the-pipeline) - * [Reproducibility](#reproducibility) + * [Updating the pipeline](#updating-the-pipeline) + * [Reproducibility](#reproducibility) * [Main arguments](#main-arguments) - * [`-profile`](#-profile) - * [`--input`](#--input) + * [`-profile`](#-profile) + * [`--input`](#--input) * [Generic arguments](#generic-arguments) - * [`--single_end`](#--single_end) - * [`--seq_center`](#--seq_center) - * [`--fragment_size`](#--fragment_size) - * [`--fingerprint_bins`](#--fingerprint_bins) + * [`--single_end`](#--single_end) + * [`--seq_center`](#--seq_center) + * [`--fragment_size`](#--fragment_size) + * [`--fingerprint_bins`](#--fingerprint_bins) * [Reference genomes](#reference-genomes) - * [`--genome` (using iGenomes)](#--genome-using-igenomes) - * [`--fasta`](#--fasta) - * [`--gtf`](#--gtf) - * [`--bwa_index`](#--bwa_index) - * [`--gene_bed`](#--gene_bed) - * [`--tss_bed`](#--tss_bed) - * [`--macs_gsize`](#--macs_gsize) - * [`--blacklist`](#--blacklist) - * [`--mito_name`](#--mito_name) - * [`--save_reference`](#--save_reference) - * [`--igenomes_ignore`](#--igenomes_ignore) + * [`--genome` (using iGenomes)](#--genome-using-igenomes) + * [`--fasta`](#--fasta) + * [`--gtf`](#--gtf) + * [`--bwa_index`](#--bwa_index) + * [`--gene_bed`](#--gene_bed) + * [`--tss_bed`](#--tss_bed) + * [`--macs_gsize`](#--macs_gsize) + * [`--blacklist`](#--blacklist) + * [`--mito_name`](#--mito_name) + * [`--save_reference`](#--save_reference) + * [`--igenomes_ignore`](#--igenomes_ignore) * [Adapter trimming](#adapter-trimming) - * [`--skip_trimming`](#--skip_trimming) - * [`--save_trimmed`](#--save_trimmed) + * [`--skip_trimming`](#--skip_trimming) + * [`--save_trimmed`](#--save_trimmed) * [Alignments](#alignments) - * [`--keep_mito`](#--keep_mito) - * [`--keep_dups`](#--keep_dups) - * [`--keep_multi_map`](#--keep_multi_map) - * [`--skip_merge_replicates`](#--skip_merge_replicates) - * [`--save_align_intermeds`](#--save_align_intermeds) + * [`--bwa_min_score`](#--bwa_min_score) + * [`--keep_mito`](#--keep_mito) + * [`--keep_dups`](#--keep_dups) + * [`--keep_multi_map`](#--keep_multi_map) + * [`--skip_merge_replicates`](#--skip_merge_replicates) + * [`--save_align_intermeds`](#--save_align_intermeds) * [Peaks](#peaks) - * [`--narrow_peak`](#--narrow_peak) - * [`--broad_cutoff`](#--broad_cutoff) - * [`--min_reps_consensus`](#--min_reps_consensus) - * [`--save_macs_pileup`](#--save_macs_pileup) - * [`--skip_diff_analysis`](#--skip_diff_analysis) + * [`--narrow_peak`](#--narrow_peak) + * [`--broad_cutoff`](#--broad_cutoff) + * [`--macs_fdr`](#--macs_fdr) + * [`--macs_pvalue`](#--macs_pvalue) + * [`--min_reps_consensus`](#--min_reps_consensus) + * [`--save_macs_pileup`](#--save_macs_pileup) + * [`--skip_peak_qc`](#--skip_peak_qc) + * [`--skip_peak_annotation`](#--skip_peak_annotation) + * [`--skip_consensus_peaks`](#--skip_consensus_peaks) +* [Differential analysis](#differential_analysis) + * [`--deseq2_vst`](#--deseq2_vst) + * [`--skip_diff_analysis`](#--skip_diff_analysis) * [Skipping QC steps](#skipping-qc-steps) * [Job resources](#job-resources) - * [Automatic resubmission](#automatic-resubmission) - * [Custom resource requests](#custom-resource-requests) + * [Automatic resubmission](#automatic-resubmission) + * [Custom resource requests](#custom-resource-requests) * [AWS Batch specific parameters](#aws-batch-specific-parameters) - * [`--awsqueue`](#--awsqueue) - * [`--awsregion`](#--awsregion) + * [`--awsqueue`](#--awsqueue) + * [`--awsregion`](#--awsregion) + * [`--awscli`](#--awscli) * [Other command line parameters](#other-command-line-parameters) - * [`--outdir`](#--outdir) - * [`--email`](#--email) - * [`--email_on_fail`](#--email_on_fail) - * [`--max_multiqc_email_size`](#--max_multiqc_email_size) - * [`-name`](#-name) - * [`-resume`](#-resume) - * [`-c`](#-c) - * [`--custom_config_version`](#--custom_config_version) - * [`--custom_config_base`](#--custom_config_base) - * [`--max_memory`](#--max_memory) - * [`--max_time`](#--max_time) - * [`--max_cpus`](#--max_cpus) - * [`--plaintext_email`](#--plaintext_email) - * [`--monochrome_logs`](#--monochrome_logs) - * [`--multiqc_config`](#--multiqc_config) - + * [`--outdir`](#--outdir) + * [`--publish_dir_mode`](#--publish_dir_mode) + * [`--email`](#--email) + * [`--email_on_fail`](#--email_on_fail) + * [`--max_multiqc_email_size`](#--max_multiqc_email_size) + * [`-name`](#-name) + * [`-resume`](#-resume) + * [`-c`](#-c) + * [`--custom_config_version`](#--custom_config_version) + * [`--custom_config_base`](#--custom_config_base) + * [`--max_memory`](#--max_memory) + * [`--max_time`](#--max_time) + * [`--max_cpus`](#--max_cpus) + * [`--plaintext_email`](#--plaintext_email) + * [`--monochrome_logs`](#--monochrome_logs) + * [`--multiqc_config`](#--multiqc_config) ## Introduction @@ -118,24 +125,32 @@ This version number will be logged in reports when you run the pipeline, so that ### `-profile` -Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. Note that multiple profiles can be loaded, for example: `-profile docker` - the order of arguments is important! +Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. -If `-profile` is not specified at all the pipeline will be run locally and expects all software to be installed and available on the `PATH`. +Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Conda) - see below. + +> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. + +The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). + +Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important! +They are loaded in sequence, so later profiles can overwrite earlier profiles. + +If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended. -* `awsbatch` - * A generic configuration profile to be used with AWS Batch. -* `conda` - * A generic configuration profile to be used with [conda](https://conda.io/docs/) - * Pulls most software from [Bioconda](https://bioconda.github.io/) * `docker` - * A generic configuration profile to be used with [Docker](http://docker.com/) - * Pulls software from dockerhub: [`nfcore/atacseq`](http://hub.docker.com/r/nfcore/atacseq/) + * A generic configuration profile to be used with [Docker](http://docker.com/) + * Pulls software from dockerhub: [`nfcore/atacseq`](http://hub.docker.com/r/nfcore/atacseq/) * `singularity` - * A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/) - * Pulls software from DockerHub: [`nfcore/atacseq`](http://hub.docker.com/r/nfcore/atacseq/) + * A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/) + * Pulls software from DockerHub: [`nfcore/atacseq`](http://hub.docker.com/r/nfcore/atacseq/) +* `conda` + * Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker or Singularity. + * A generic configuration profile to be used with [Conda](https://conda.io/docs/) + * Pulls most software from [Bioconda](https://bioconda.github.io/) * `test` - * A profile with a complete configuration for automated testing - * Includes links to test data so needs no other parameters + * A profile with a complete configuration for automated testing + * Includes links to test data so needs no other parameters ### `--input` @@ -223,13 +238,13 @@ There are 31 different species supported in the iGenomes references. To run the You can find the keys to specify the genomes in the [iGenomes config file](../conf/igenomes.config). Common genomes that are supported are: * Human - * `--genome GRCh37` + * `--genome GRCh37` * Mouse - * `--genome GRCm38` + * `--genome GRCm38` * _Drosophila_ - * `--genome BDGP6` + * `--genome BDGP6` * _S. cerevisiae_ - * `--genome 'R64-1-1'` + * `--genome 'R64-1-1'` > There are numerous others - check the config file for more. @@ -314,11 +329,11 @@ Name of mitochondrial chomosome in reference assembly. Reads aligning to this co ### `--save_reference` -If the BWA index is generated by the pipeline use this parameter to save it to your results folder. These can then be used for future pipeline runs, reducing processing times. +If the BWA index is generated by the pipeline use this parameter to save it to your results folder. These can then be used for future pipeline runs, reducing processing times (Default: false). ### `--igenomes_ignore` -Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`. +Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config` (Default: false). ## Adapter trimming @@ -326,59 +341,71 @@ The pipeline accepts a number of parameters to change how the trimming is done, You can specify custom trimming parameters as follows: * `--clip_r1 [int]` - * Instructs Trim Galore to remove [int] bp from the 5' end of read 1 (for single-end reads). + * Instructs Trim Galore to remove [int] bp from the 5' end of read 1 (for single-end reads). * `--clip_r2 [int]` - * Instructs Trim Galore to remove [int] bp from the 5' end of read 2 (paired-end reads only). + * Instructs Trim Galore to remove [int] bp from the 5' end of read 2 (paired-end reads only). * `--three_prime_clip_r1 [int]` - * Instructs Trim Galore to remove [int] bp from the 3' end of read 1 _AFTER_ adapter/quality trimming has been + * Instructs Trim Galore to remove [int] bp from the 3' end of read 1 _AFTER_ adapter/quality trimming has been * `--three_prime_clip_r2 [int]` - * Instructs Trim Galore to remove [int] bp from the 3' end of read 2 _AFTER_ adapter/quality trimming has been performed. + * Instructs Trim Galore to remove [int] bp from the 3' end of read 2 _AFTER_ adapter/quality trimming has been performed. * `--trim_nextseq [int]` - * This enables the option Cutadapt `--nextseq-trim=3'CUTOFF` option via Trim Galore, which will set a quality cutoff (that is normally given with -q instead), but qualities of G bases are ignored. This trimming is in common for the NextSeq- and NovaSeq-platforms, where basecalls without any signal are called as high-quality G bases. + * This enables the option Cutadapt `--nextseq-trim=3'CUTOFF` option via Trim Galore, which will set a quality cutoff (that is normally given with -q instead), but qualities of G bases are ignored. This trimming is in common for the NextSeq- and NovaSeq-platforms, where basecalls without any signal are called as high-quality G bases. ### `--skip_trimming` -Skip the adapter trimming step. Use this if your input FastQ files have already been trimmed outside of the workflow or if you're very confident that there is no adapter contamination in your data. +Skip the adapter trimming step. Use this if your input FastQ files have already been trimmed outside of the workflow or if you're very confident that there is no adapter contamination in your data (Default: false). ### `--save_trimmed` -By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete. +By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete (Default: false). ## Alignments +### `--bwa_min_score` + +Don’t output BWA MEM alignments with score lower than this parameter (Default: false). + ### `--keep_mito` -Reads mapping to mitochondrial contig are not filtered from alignments. +Reads mapping to mitochondrial contig are not filtered from alignments (Default: false). ### `--keep_dups` -Duplicate reads are not filtered from alignments. +Duplicate reads are not filtered from alignments (Default: false). ### `--keep_multi_map` -Reads mapping to multiple locations in the genome are not filtered from alignments. +Reads mapping to multiple locations in the genome are not filtered from alignments (Default: false). ### `--skip_merge_replicates` -An additional series of steps are performed by the pipeline by merging the replicates from the same experimental group. This is primarily to increase the sequencing depth in order to perform downstream analyses such as footprinting. Specifying this parameter means that these steps will not be performed. +An additional series of steps are performed by the pipeline by merging the replicates from the same experimental group. This is primarily to increase the sequencing depth in order to perform downstream analyses such as footprinting. Specifying this parameter means that these steps will not be performed (Default: false). ### `--save_align_intermeds` -By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set to true to also save other intermediate BAM files. +By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set to true to also save other intermediate BAM files (Default: false). ## Peaks ### `--narrow_peak` -MACS2 is run by default with the [`--broad`](https://github.com/taoliu/MACS#--broad) flag. Specify this flag to call peaks in narrowPeak mode. +MACS2 is run by default with the [`--broad`](https://github.com/taoliu/MACS#--broad) flag. Specify this flag to call peaks in narrowPeak mode (Default: false). ### `--broad_cutoff` Specifies broad cut-off value for MACS2. Only used when `--narrow_peak` isnt specified (Default: `0.1`). +### `--macs_fdr` + +Minimum FDR (q-value) cutoff for peak detection, `--macs_fdr` and `--macs_pvalue` are mutually exclusive (Default: false). + +### `--macs_pvalue` + +p-value cutoff for peak detection, `--macs_fdr` and `--macs_pvalue` are mutually exclusive (Default: false). If `--macs_pvalue` cutoff is set, q-value will not be calculated and reported as -1 in the final .xls file. + ### `--min_reps_consensus` -Number of biological replicates required from a given condition for a peak to contribute to a consensus peak . If you are confident you have good reproducibility amongst your replicates then you can increase the value of this parameter to create a "reproducible" set of consensus of peaks. For example, a value of 2 will mean peaks that have been called in at least 2 replicates will contribute to the consensus set of peaks, and as such peaks that are unique to a given replicate will be discarded. +Number of biological replicates required from a given condition for a peak to contribute to a consensus peak . If you are confident you have good reproducibility amongst your replicates then you can increase the value of this parameter to create a "reproducible" set of consensus of peaks. For example, a value of 2 will mean peaks that have been called in at least 2 replicates will contribute to the consensus set of peaks, and as such peaks that are unique to a given replicate will be discarded (Default: 1). ```bash -- min_reps_consensus 1 @@ -386,11 +413,29 @@ Number of biological replicates required from a given condition for a peak to co ### `--save_macs_pileup` -Instruct MACS2 to create bedGraph files using the `-B --SPMR` parameters. +Instruct MACS2 to create bedGraph files using the `-B --SPMR` parameters (Default: false). + +### `--skip_peak_qc` + +Skip MACS2 peak QC plot generation (Default: false). + +### `--skip_peak_annotation` + +Skip annotation of MACS2 and consensus peaks with HOMER (Default: false). + +### `--skip_consensus_peaks` + +Skip consensus peak generation, annotation and counting (Default: false). + +## Differential analysis + +### `--deseq2_vst` + +Use `vst` transformation instead of `rlog` with DESeq2. See [DESeq2 docs](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization) (Default: false). ### `--skip_diff_analysis` -Skip read counting and differential analysis step. +Skip differential accessibility analysis with DESeq2 (Default: false). ## Skipping QC steps @@ -420,11 +465,11 @@ Wherever process-specific requirements are set in the pipeline, the default valu If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition below). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. -If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack/). +If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack). ## AWS Batch specific parameters -Running the pipeline on AWS Batch requires a couple of specific parameters to be set according to your AWS Batch configuration. Please use the `-awsbatch` profile and then specify all of the following parameters. +Running the pipeline on AWS Batch requires a couple of specific parameters to be set according to your AWS Batch configuration. Please use [`-profile awsbatch`](https://github.com/nf-core/configs/blob/master/conf/awsbatch.config) and then specify all of the following parameters. ### `--awsqueue` @@ -432,7 +477,11 @@ The JobQueue that you intend to use on AWS Batch. ### `--awsregion` -The AWS region to run your job in. Default is set to `eu-west-1` but can be adjusted to your needs. +The AWS region in which to run your job. Default is set to `eu-west-1` but can be adjusted to your needs. + +### `--awscli` + +The [AWS CLI](https://www.nextflow.io/docs/latest/awscloud.html#aws-cli-installation) path in your custom AMI. Default: `/home/ec2-user/miniconda/bin/aws`. Please make sure to also set the `-w/--work-dir` and `--outdir` parameters to a S3 storage bucket of your choice - you'll get an error message notifying you if you didn't. @@ -442,6 +491,10 @@ Please make sure to also set the `-w/--work-dir` and `--outdir` parameters to a The output directory where the results will be saved. +### `--publish_dir_mode` + +Value passed to Nextflow [`publishDir`](https://www.nextflow.io/docs/latest/process.html#publishdir) directive for publishing results in the output directory. Available: 'symlink', 'rellink', 'link', 'copy', 'copyNoFollow' and 'move' (Default: 'copy'). + ### `--email` Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run. @@ -452,7 +505,7 @@ This works exactly as with `--email`, except emails are only sent if the workflo ### `--max_multiqc_email_size` -Theshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: `25MB`). +Threshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB). ### `-name` @@ -480,7 +533,7 @@ Note - you can use this to override pipeline defaults. ### `--custom_config_version` -Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default is set to `master`. +Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`. ```bash ## Download and use config file with following git commid id diff --git a/environment.yml b/environment.yml index 7291df75..1fde3ddf 100755 --- a/environment.yml +++ b/environment.yml @@ -1,40 +1,47 @@ # You can use this file to create a conda environment for this pipeline: # conda env create -f environment.yml -name: nf-core-atacseq-1.1.0 +name: nf-core-atacseq-1.2.0 channels: - conda-forge - bioconda - defaults dependencies: ## conda-forge packages - - gawk=4.2.1 - - r-base=3.4.1 - - r-optparse=1.6.0 - - r-rcolorbrewer=1.1_2 - - r-ggplot2=3.1.0 - - r-reshape2=1.4.3 - - r-scales=1.0.0 - - r-pheatmap=1.0.10 - - r-lattice=0.20_35 - - r-upsetr=1.3.3 - - r-xfun=0.3 + - conda-forge::python=3.7.6 + - conda-forge::markdown=3.2.2 + - conda-forge::pymdown-extensions=7.1 + - conda-forge::pygments=2.6.1 + - conda-forge::r-base=3.6.2 + - conda-forge::r-optparse=1.6.6 + - conda-forge::r-rcolorbrewer=1.1_2 + - conda-forge::r-reshape2=1.4.4 + - conda-forge::r-ggplot2=3.3.2 + - conda-forge::r-tidyr=1.1.0 + - conda-forge::r-scales=1.1.1 + - conda-forge::r-pheatmap=1.0.12 + - conda-forge::r-lattice=0.20_41 + - conda-forge::r-upsetr=1.4.0 + - conda-forge::r-xfun=0.15 + - conda-forge::gawk=5.1.0 + - conda-forge::pigz=2.3.4 ## Required for TrimGalore multi-threading ## bioconda packages - - fastqc=0.11.8 - - trim-galore=0.5.0 - - bwa=0.7.17 - - samtools=1.9 - - picard=2.19.0 - - bamtools=2.5.1 - - pysam=0.15.2 - - bedtools=2.27.1 - - preseq=2.0.3 - - deeptools=3.2.1 - - ucsc-bedgraphtobigwig=377 - - macs2=2.1.2 - - homer=4.9.1 - - ataqv=1.0.0 - - subread=1.6.4 - - multiqc=1.7 - - bioconductor-deseq2=1.20.0 - - bioconductor-vsn=3.46.0 + - bioconda::fastqc=0.11.9 + - bioconda::trim-galore=0.6.5 + - bioconda::bwa=0.7.17 + - bioconda::samtools=1.10 + - bioconda::picard=2.23.1 + - bioconda::bamtools=2.5.1 + - bioconda::pysam=0.15.3 + - bioconda::bedtools=2.29.2 + - bioconda::ucsc-bedgraphtobigwig=357 + - bioconda::deeptools=3.4.3 + - bioconda::macs2=2.2.7.1 + - bioconda::homer=4.11 + - bioconda::ataqv=1.1.1 + - bioconda::subread=2.0.1 + - bioconda::preseq=2.0.3 + - bioconda::multiqc=1.9 + - bioconda::bioconductor-biocparallel=1.20.0 + - bioconda::bioconductor-deseq2=1.26.0 + - bioconda::bioconductor-vsn=3.54.0 diff --git a/main.nf b/main.nf index 1f32ea96..44e4b9b0 100644 --- a/main.nf +++ b/main.nf @@ -19,71 +19,82 @@ def helpMessage() { nextflow run nf-core/atacseq --input design.csv --genome GRCh37 -profile docker Mandatory arguments: - --input [file] Comma-separated file containing information about the samples in the experiment (see docs/usage.md) - --fasta [file] Path to Fasta reference. Not mandatory when using reference in iGenomes config via --genome - --gtf [file] Path to GTF file. Not mandatory when using reference in iGenomes config via --genome + --input [file] Comma-separated file containing information about the samples in the experiment (see docs/usage.md) (Default: './design.csv') + --fasta [file] Path to Fasta reference. Not mandatory when using reference in iGenomes config via --genome (Default: false) + --gtf [file] Path to GTF file. Not mandatory when using reference in iGenomes config via --genome (Default: false) -profile [str] Configuration profile to use. Can use multiple (comma separated) Available: conda, docker, singularity, awsbatch, test Generic - --single_end [bool] Specifies that the input is single-end reads - --seq_center [str] Sequencing center information to be added to read group of BAM files + --single_end [bool] Specifies that the input is single-end reads (Default: false) + --seq_center [str] Sequencing center information to be added to read group of BAM files (Default: false) --fragment_size [int] Estimated fragment size used to extend single-end reads (Default: 0) --fingerprint_bins [int] Number of genomic bins to use when calculating fingerprint plot (Default: 500000) References If not specified in the configuration file or you wish to overwrite any of the references - --genome [str] Name of iGenomes reference - --bwa_index [file] Full path to directory containing BWA index including base name i.e. /path/to/index/genome.fa - --gene_bed [file] Path to BED file containing gene intervals - --tss_bed [file] Path to BED file containing transcription start sites - --macs_gsize [str] Effective genome size parameter required by MACS2. If using iGenomes config, values have only been provided when --genome is set as GRCh37, GRCm38, hg19, mm10, BDGP6 and WBcel235 - --blacklist [file] Path to blacklist regions (.BED format), used for filtering alignments - --mito_name [str] Name of Mitochondrial chomosome in genome fasta (e.g. chrM). Reads aligning to this contig are filtered out - --save_reference [bool] If generated by the pipeline save the BWA index in the results directory + --genome [str] Name of iGenomes reference (Default: false) + --bwa_index [file] Full path to directory containing BWA index including base name i.e. /path/to/index/genome.fa (Default: false) + --gene_bed [file] Path to BED file containing gene intervals (Default: false) + --tss_bed [file] Path to BED file containing transcription start sites (Default: false) + --macs_gsize [str] Effective genome size parameter required by MACS2. If using iGenomes config, values have only been provided when --genome is set as GRCh37, GRCm38, hg19, mm10, BDGP6 and WBcel235 (Default: false) + --blacklist [file] Path to blacklist regions (.BED format), used for filtering alignments (Default: false) + --mito_name [str] Name of Mitochondrial chomosome in genome fasta (e.g. chrM). Reads aligning to this contig are filtered out (Default: false) + --save_reference [bool] If generated by the pipeline save the BWA index in the results directory (Default: false) Trimming --clip_r1 [int] Instructs Trim Galore to remove bp from the 5' end of read 1 (or single-end reads) (Default: 0) --clip_r2 [int] Instructs Trim Galore to remove bp from the 5' end of read 2 (paired-end reads only) (Default: 0) --three_prime_clip_r1 [int] Instructs Trim Galore to remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed (Default: 0) - --three_prime_clip_r2 [int] Instructs Trim Galore to re move bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed (Default: 0) + --three_prime_clip_r2 [int] Instructs Trim Galore to remove bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed (Default: 0) --trim_nextseq [int] Instructs Trim Galore to apply the --nextseq=X option, to trim based on quality after removing poly-G tails (Default: 0) - --skip_trimming [bool] Skip the adapter trimming step - --save_trimmed [bool] Save the trimmed FastQ files in the results directory + --skip_trimming [bool] Skip the adapter trimming step (Default: false) + --save_trimmed [bool] Save the trimmed FastQ files in the results directory (Default: false) Alignments - --keep_mito [bool] Reads mapping to mitochondrial contig are not filtered from alignments - --keep_dups [bool] Duplicate reads are not filtered from alignments - --keep_multi_map [bool] Reads mapping to multiple locations are not filtered from alignments - --skip_merge_replicates [bool] Do not perform alignment merging and downstream analysis by merging replicates i.e. only do this by merging resequenced libraries - --save_align_intermeds [bool] Save the intermediate BAM files from the alignment step - not done by default + --bwa_min_score [int] Don’t output BWA MEM alignments with score lower than this parameter (Default: false) + --keep_mito [bool] Reads mapping to mitochondrial contig are not filtered from alignments (Default: false) + --keep_dups [bool] Duplicate reads are not filtered from alignments (Default: false) + --keep_multi_map [bool] Reads mapping to multiple locations are not filtered from alignments (Default: false) + --skip_merge_replicates [bool] Do not perform alignment merging and downstream analysis by merging replicates i.e. only do this by merging resequenced libraries (Default: false) + --save_align_intermeds [bool] Save the intermediate BAM files from the alignment step - not done by default (Default: false) Peaks - --narrow_peak [bool] Run MACS2 in narrowPeak mode + --narrow_peak [bool] Run MACS2 in narrowPeak mode (Default: false) --broad_cutoff [float] Specifies broad cutoff value for MACS2. Only used when --narrow_peak isnt specified (Default: 0.1) + --macs_fdr [float] Minimum FDR (q-value) cutoff for peak detection, --macs_fdr and --macs_pvalue are mutually exclusive (Default: false) + --macs_pvalue [float] p-value cutoff for peak detection, --macs_fdr and --macs_pvalue are mutually exclusive (Default: false) --min_reps_consensus [int] Number of biological replicates required from a given condition for a peak to contribute to a consensus peak (Default: 1) - --save_macs_pileup [bool] Instruct MACS2 to create bedGraph files normalised to signal per million reads - --skip_diff_analysis [bool] Skip differential binding analysis + --save_macs_pileup [bool] Instruct MACS2 to create bedGraph files normalised to signal per million reads (Default: false) + --skip_peak_qc [bool] Skip MACS2 peak QC plot generation (Default: false) + --skip_peak_annotation [bool] Skip annotation of MACS2 and consensus peaks with HOMER (Default: false) + --skip_consensus_peaks [bool] Skip consensus peak generation (Default: false) + + Differential analysis + --deseq2_vst [bool] Use vst transformation instead of rlog with DESeq2 (Default: false) + --skip_diff_analysis [bool] Skip differential accessibility analysis (Default: false) QC - --skip_fastqc [bool] Skip FastQC - --skip_picard_metrics [bool] Skip Picard CollectMultipleMetrics - --skip_preseq [bool] Skip Preseq - --skip_plot_profile [bool] Skip deepTools plotProfile - --skip_plot_fingerprint [bool] Skip deepTools plotFingerprint - --skip_ataqv [bool] Skip Ataqv - --skip_igv [bool] Skip IGV - --skip_multiqc [bool] Skip MultiQC + --skip_fastqc [bool] Skip FastQC (Default: false) + --skip_picard_metrics [bool] Skip Picard CollectMultipleMetrics (Default: false) + --skip_preseq [bool] Skip Preseq (Default: false) + --skip_plot_profile [bool] Skip deepTools plotProfile (Default: false) + --skip_plot_fingerprint [bool] Skip deepTools plotFingerprint (Default: false) + --skip_ataqv [bool] Skip Ataqv (Default: false) + --skip_igv [bool] Skip IGV (Default: false) + --skip_multiqc [bool] Skip MultiQC (Default: false) Other - --outdir [file] The output directory where the results will be saved - --email [email] Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits - --email_on_fail [email] Same as --email, except only send mail if the workflow is not successful - --max_multiqc_email_size [str] Theshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB) - -name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic + --outdir [file] The output directory where the results will be saved (Default: './results') + --publish_dir_mode [str] Mode for publishing results in the output directory. Available: symlink, rellink, link, copy, copyNoFollow, move (Default: copy) + --email [email] Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits (Default: false) + --email_on_fail [email] Same as --email, except only send mail if the workflow is not successful (Default: false) + --max_multiqc_email_size [str] Threshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB) + -name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic (Default: false) AWSBatch --awsqueue [str] The AWSBatch JobQueue that needs to be set when running on AWSBatch --awsregion [str] The AWS Region for your AWS Batch job to run on + --awscli [str] Path to the AWS CLI tool """.stripIndent() } @@ -95,25 +106,28 @@ def helpMessage() { /////////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////// -/* - * SET UP CONFIGURATION VARIABLES - */ - // Show help message if (params.help) { helpMessage() exit 0 } -// Check if genome exists in the config file -if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { - exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(", ")}" +// Has the run name been specified by the user? +// this has the bonus effect of catching both -name and --name +custom_runName = params.name +if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { + custom_runName = workflow.runName } //////////////////////////////////////////////////// /* -- DEFAULT PARAMETER VALUES -- */ //////////////////////////////////////////////////// +// Check if genome exists in the config file +if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { + exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(", ")}" +} + // Configurable variables params.fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false params.bwa_index = params.genome ? params.genomes[ params.genome ].bwa ?: false : false @@ -122,23 +136,20 @@ params.gene_bed = params.genome ? params.genomes[ params.genome ].bed12 ?: false params.mito_name = params.genome ? params.genomes[ params.genome ].mito_name ?: false : false params.macs_gsize = params.genome ? params.genomes[ params.genome ].macs_gsize ?: false : false params.blacklist = params.genome ? params.genomes[ params.genome ].blacklist ?: false : false +params.anno_readme = params.genome ? params.genomes[ params.genome ].readme ?: false : false // Global variables -def PEAK_TYPE = params.narrow_peak ? "narrowPeak" : "broadPeak" - -// Has the run name been specified by the user? -// this has the bonus effect of catching both -name and --name -custom_runName = params.name -if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { - custom_runName = workflow.runName -} +def PEAK_TYPE = params.narrow_peak ? 'narrowPeak' : 'broadPeak' //////////////////////////////////////////////////// /* -- CONFIG FILES -- */ //////////////////////////////////////////////////// // Pipeline config +ch_multiqc_config = file("$baseDir/assets/multiqc_config.yaml", checkIfExists: true) +ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) +ch_output_docs_images = file("$baseDir/docs/images/", checkIfExists: true) // JSON files required by BAMTools for alignment filtering if (params.single_end) { @@ -148,7 +159,6 @@ if (params.single_end) { } // Header files for MultiQC -ch_multiqc_config = file(params.multiqc_config, checkIfExists: true) ch_mlib_peak_count_header = file("$baseDir/assets/multiqc/mlib_peak_count_header.txt", checkIfExists: true) ch_mlib_frip_score_header = file("$baseDir/assets/multiqc/mlib_frip_score_header.txt", checkIfExists: true) ch_mlib_peak_annotation_header = file("$baseDir/assets/multiqc/mlib_peak_annotation_header.txt", checkIfExists: true) @@ -165,9 +175,8 @@ ch_mrep_deseq2_clustering_header = file("$baseDir/assets/multiqc/mrep_deseq2_clu /* -- VALIDATE INPUTS -- */ //////////////////////////////////////////////////// -// Validate inputs -if (params.input) { ch_input = file(params.input, checkIfExists: true) } else { exit 1, "Samples design file not specified!" } -if (params.gtf) { ch_gtf = file(params.gtf, checkIfExists: true) } else { exit 1, "GTF annotation file not specified!" } +if (params.input) { ch_input = file(params.input, checkIfExists: true) } else { exit 1, 'Samples design file not specified!' } +if (params.gtf) { ch_gtf = file(params.gtf, checkIfExists: true) } else { exit 1, 'GTF annotation file not specified!' } if (params.gene_bed) { ch_gene_bed = file(params.gene_bed, checkIfExists: true) } if (params.tss_bed) { ch_tss_bed = file(params.tss_bed, checkIfExists: true) } if (params.blacklist) { ch_blacklist = Channel.fromPath(params.blacklist, checkIfExists: true) } else { ch_blacklist = Channel.empty() } @@ -177,7 +186,7 @@ if (params.fasta) { bwa_base = params.fasta.substring(lastPath+1) ch_fasta = file(params.fasta, checkIfExists: true) } else { - exit 1, "Fasta file not specified!" + exit 1, 'Fasta file not specified!' } if (params.bwa_index) { @@ -189,18 +198,24 @@ if (params.bwa_index) { .set { ch_bwa_index } } +// Save AWS IGenomes file containing annotation version +if (params.anno_readme && file(params.anno_readme).exists()) { + file("${params.outdir}/genome/").mkdirs() + file(params.anno_readme).copyTo("${params.outdir}/genome/") +} + //////////////////////////////////////////////////// /* -- AWS -- */ //////////////////////////////////////////////////// -if (workflow.profile == 'awsbatch') { +if (workflow.profile.contains('awsbatch')) { // AWSBatch sanity checking - if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" + if (!params.awsqueue || !params.awsregion) exit 1, 'Specify correct --awsqueue and --awsregion parameters on AWSBatch!' // Check outdir paths to be S3 buckets if running on AWSBatch // related: https://github.com/nextflow-io/nextflow/issues/813 - if (!params.outdir.startsWith('s3:')) exit 1, "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" + if (!params.outdir.startsWith('s3:')) exit 1, 'Outdir not on S3 - specify S3 Bucket to run on AWSBatch!' // Prevent trace files to be stored on S3 since S3 does not support rolling files. - if (workflow.tracedir.startsWith('s3:')) exit 1, "Specify a local tracedir or run without trace! S3 cannot be used for tracefiles." + if (params.tracedir.startsWith('s3:')) exit 1, 'Specify a local tracedir or run without trace! S3 cannot be used for tracefiles.' } /////////////////////////////////////////////////////////////////////////////// @@ -225,10 +240,13 @@ if (params.tss_bed) summary['TSS BED File'] = params.tss_bed if (params.bwa_index) summary['BWA Index'] = params.bwa_index if (params.blacklist) summary['Blacklist BED'] = params.blacklist if (params.mito_name) summary['Mitochondrial Contig'] = params.mito_name +if (params.bwa_min_score) summary['BWA Min Score'] = params.bwa_min_score summary['MACS2 Genome Size'] = params.macs_gsize ?: 'Not supplied' summary['Min Consensus Reps'] = params.min_reps_consensus if (params.macs_gsize) summary['MACS2 Narrow Peaks'] = params.narrow_peak ? 'Yes' : 'No' if (!params.narrow_peak) summary['MACS2 Broad Cutoff'] = params.broad_cutoff +if (params.macs_fdr) summary['MACS2 FDR'] = params.macs_fdr +if (params.macs_pvalue) summary['MACS2 P-value'] = params.macs_pvalue if (params.skip_trimming) { summary['Trimming Step'] = 'Skipped' } else { @@ -236,7 +254,7 @@ if (params.skip_trimming) { summary['Trim R2'] = "$params.clip_r2 bp" summary["Trim 3' R1"] = "$params.three_prime_clip_r1 bp" summary["Trim 3' R2"] = "$params.three_prime_clip_r2 bp" - summary["NextSeq Trim"] = "$params.trim_nextseq bp" + summary['NextSeq Trim'] = "$params.trim_nextseq bp" } if (params.seq_center) summary['Sequencing Center'] = params.seq_center if (params.single_end) summary['Fragment Size'] = "$params.fragment_size bp" @@ -249,25 +267,30 @@ if (params.save_trimmed) summary['Save Trimmed'] = 'Yes' if (params.save_align_intermeds) summary['Save Intermeds'] = 'Yes' if (params.save_macs_pileup) summary['Save MACS2 Pileup'] = 'Yes' if (params.skip_merge_replicates) summary['Skip Merge Replicates'] = 'Yes' -if (params.skip_diff_analysis) summary['Skip Diff Analysis'] = 'Yes' +if (params.skip_peak_qc) summary['Skip MACS2 Peak QC'] = 'Yes' +if (params.skip_peak_annotation) summary['Skip Peak Annotation'] = 'Yes' +if (params.skip_consensus_peaks) summary['Skip Consensus Peaks'] = 'Yes' +if (params.deseq2_vst) summary['Use DESeq2 vst Transform'] = 'Yes' +if (params.skip_diff_analysis) summary['Skip Differential Analysis'] = 'Yes' if (params.skip_fastqc) summary['Skip FastQC'] = 'Yes' if (params.skip_picard_metrics) summary['Skip Picard Metrics'] = 'Yes' if (params.skip_preseq) summary['Skip Preseq'] = 'Yes' if (params.skip_plot_profile) summary['Skip plotProfile'] = 'Yes' if (params.skip_plot_fingerprint) summary['Skip plotFingerprint'] = 'Yes' -if (params.skip_ataqv) summary['Skip Ataqv'] = 'Yes' +if (params.skip_ataqv) summary['Skip Ataqv'] = 'Yes' if (params.skip_igv) summary['Skip IGV'] = 'Yes' if (params.skip_multiqc) summary['Skip MultiQC'] = 'Yes' summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" -if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" +if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" summary['Output Dir'] = params.outdir summary['Launch Dir'] = workflow.launchDir summary['Working Dir'] = workflow.workDir summary['Script Dir'] = workflow.projectDir summary['User'] = workflow.userName -if (workflow.profile == 'awsbatch') { +if (workflow.profile.contains('awsbatch')) { summary['AWS Region'] = params.awsregion summary['AWS Queue'] = params.awsqueue + summary['AWS CLI'] = params.awscli } summary['Config Profile'] = workflow.profile if (params.config_profile_description) summary['Config Description'] = params.config_profile_description @@ -278,7 +301,7 @@ if (params.email || params.email_on_fail) { summary['E-mail on failure'] = params.email_on_fail summary['MultiQC Max Size'] = params.max_multiqc_email_size } -log.info summary.collect { k,v -> "${k.padRight(21)}: $v" }.join("\n") +log.info summary.collect { k,v -> "${k.padRight(21)}: $v" }.join('\n') log.info "-\033[2m--------------------------------------------------\033[0m-" // Check the hostnames against configured profiles @@ -303,17 +326,17 @@ if (!params.macs_gsize) { /////////////////////////////////////////////////////////////////////////////// /* - * PREPROCESSING - REFORMAT DESIGN FILE AND CHECK VALIDITY + * PREPROCESSING: Reformat design file and check validitiy */ -process CheckDesign { +process CHECK_DESIGN { tag "$design" - publishDir "${params.outdir}/pipeline_info", mode: 'copy' + publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode input: - file design from ch_input + path design from ch_input output: - file "*.csv" into ch_design_reads_csv + path '*.csv' into ch_design_reads_csv script: // This script is bundled with the pipeline, in nf-core/atacseq/bin/ """ @@ -366,20 +389,20 @@ multipleGroups = design_multiple_samples /////////////////////////////////////////////////////////////////////////////// /* - * PREPROCESSING - Build BWA index + * PREPROCESSING: Build BWA index */ if (!params.bwa_index) { - process BWAIndex { + process BWA_INDEX { tag "$fasta" label 'process_high' - publishDir path: { params.save_reference ? "${params.outdir}/reference_genome" : params.outdir }, - saveAs: { params.save_reference ? it : null }, mode: 'copy' + publishDir path: { params.save_reference ? "${params.outdir}/genome" : params.outdir }, + saveAs: { params.save_reference ? it : null }, mode: params.publish_dir_mode input: - file fasta from ch_fasta + path fasta from ch_fasta output: - file "BWAIndex" into ch_bwa_index + path 'BWAIndex' into ch_bwa_index script: """ @@ -390,19 +413,29 @@ if (!params.bwa_index) { } /* - * PREPROCESSING - Generate gene BED file + * PREPROCESSING: Generate gene BED file */ +// If --gtf is supplied along with --genome +// Make gene bed from supplied --gtf instead of using iGenomes one automatically +def MAKE_BED = false if (!params.gene_bed) { - process MakeGeneBED { + MAKE_BED = true +} else if (params.genome && params.gtf) { + if (params.genomes[ params.genome ].gtf != params.gtf) { + MAKE_BED = true + } +} +if (MAKE_BED) { + process MAKE_GENE_BED { tag "$gtf" label 'process_low' - publishDir "${params.outdir}/reference_genome", mode: 'copy' + publishDir "${params.outdir}/genome", mode: params.publish_dir_mode input: - file gtf from ch_gtf + path gtf from ch_gtf output: - file "*.bed" into ch_gene_bed + path '*.bed' into ch_gene_bed script: // This script is bundled with the pipeline, in nf-core/atacseq/bin/ """ @@ -412,18 +445,18 @@ if (!params.gene_bed) { } /* - * PREPROCESSING - Generate TSS BED file + * PREPROCESSING: Generate TSS BED file */ if (!params.tss_bed) { - process MakeTSSBED { + process MAKE_TSS_BED { tag "$bed" - publishDir "${params.outdir}/reference_genome", mode: 'copy' + publishDir "${params.outdir}/genome", mode: params.publish_dir_mode input: - file bed from ch_gene_bed + path bed from ch_gene_bed output: - file "*.bed" into ch_tss_bed + path '*.bed' into ch_tss_bed script: """ @@ -433,28 +466,28 @@ if (!params.tss_bed) { } /* - * PREPROCESSING - Prepare genome intervals for filtering + * PREPROCESSING: Prepare genome intervals for filtering */ -process MakeGenomeFilter { +process MAKE_GENOME_FILTER { tag "$fasta" - publishDir "${params.outdir}/reference_genome", mode: 'copy' + publishDir "${params.outdir}/genome", mode: params.publish_dir_mode input: - file fasta from ch_fasta - file blacklist from ch_blacklist.ifEmpty([]) + path fasta from ch_fasta + path blacklist from ch_blacklist.ifEmpty([]) output: - file "$fasta" into ch_genome_fasta // FASTA FILE FOR IGV - file "*.fai" into ch_genome_fai // FAI INDEX FOR REFERENCE GENOME - file "*.bed" into ch_genome_filter_regions // BED FILE WITHOUT BLACKLIST REGIONS & MITOCHONDRIAL CONTIG FOR FILTERING - file "*.txt" into ch_genome_autosomes // TEXT FILE CONTAINING LISTING OF AUTOSOMAL CHROMOSOMES FOR ATAQV - file "*.sizes" into ch_genome_sizes_mlib_bigwig, // CHROMOSOME SIZES FILE FOR BEDTOOLS + path "$fasta" // FASTA FILE FOR IGV + path '*.fai' // FAI INDEX FOR REFERENCE GENOME + path '*.bed' into ch_genome_filter_regions // BED FILE WITHOUT BLACKLIST REGIONS & MITOCHONDRIAL CONTIG FOR FILTERING + path '*.txt' into ch_genome_autosomes // TEXT FILE CONTAINING LISTING OF AUTOSOMAL CHROMOSOMES FOR ATAQV + path '*.sizes' into ch_genome_sizes_mlib_bigwig, // CHROMOSOME SIZES FILE FOR BEDTOOLS ch_genome_sizes_mrep_bigwig script: blacklist_filter = params.blacklist ? "sortBed -i $blacklist -g ${fasta}.sizes | complementBed -i stdin -g ${fasta}.sizes" : "awk '{print \$1, '0' , \$2}' OFS='\t' ${fasta}.sizes" - name_filter = params.mito_name ? "| awk '\$1 !~ /${params.mito_name}/ {print \$0}'": "" - mito_filter = params.keep_mito ? "" : name_filter + name_filter = params.mito_name ? "| awk '\$1 !~ /${params.mito_name}/ {print \$0}'": '' + mito_filter = params.keep_mito ? '' : name_filter """ samtools faidx $fasta get_autosomes.py ${fasta}.fai ${fasta}.autosomes.txt @@ -472,24 +505,24 @@ process MakeGenomeFilter { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 1 - FastQC + * STEP 1: FastQC */ -process FastQC { +process FASTQC { tag "$name" label 'process_medium' - publishDir "${params.outdir}/fastqc", mode: 'copy', + publishDir "${params.outdir}/fastqc", mode: params.publish_dir_mode, saveAs: { filename -> - filename.endsWith(".zip") ? "zips/$filename" : "$filename" + filename.endsWith('.zip') ? "zips/$filename" : filename } when: !params.skip_fastqc input: - set val(name), file(reads) from ch_raw_reads_fastqc + tuple val(name), path(reads) from ch_raw_reads_fastqc output: - file "*.{zip,html}" into ch_fastqc_reports_mqc + path '*.{zip,html}' into ch_fastqc_reports_mqc script: // Added soft-links to original fastqs for consistent naming in MultiQC @@ -517,49 +550,62 @@ process FastQC { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 2 - Trim Galore! + * STEP 2: Trim Galore! */ if (params.skip_trimming) { ch_trimmed_reads = ch_raw_reads_trimgalore - ch_trimgalore_results_mqc = [] - ch_trimgalore_fastqc_reports_mqc = [] + ch_trimgalore_results_mqc = Channel.empty() + ch_trimgalore_fastqc_reports_mqc = Channel.empty() } else { - process TrimGalore { + process TRIMGALORE { tag "$name" - label 'process_long' - publishDir "${params.outdir}/trim_galore", mode: 'copy', + label 'process_high' + publishDir "${params.outdir}/trim_galore", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".html")) "fastqc/$filename" - else if (filename.endsWith(".zip")) "fastqc/zips/$filename" - else if (filename.endsWith("trimming_report.txt")) "logs/$filename" + if (filename.endsWith('.html')) "fastqc/$filename" + else if (filename.endsWith('.zip')) "fastqc/zips/$filename" + else if (filename.endsWith('trimming_report.txt')) "logs/$filename" else params.save_trimmed ? filename : null } input: - set val(name), file(reads) from ch_raw_reads_trimgalore + tuple val(name), path(reads) from ch_raw_reads_trimgalore output: - set val(name), file("*.fq.gz") into ch_trimmed_reads - file "*.txt" into ch_trimgalore_results_mqc - file "*.{zip,html}" into ch_trimgalore_fastqc_reports_mqc + tuple val(name), path('*.fq.gz') into ch_trimmed_reads + path '*.txt' into ch_trimgalore_results_mqc + path '*.{zip,html}' into ch_trimgalore_fastqc_reports_mqc script: + // Calculate number of --cores for TrimGalore based on value of task.cpus + // See: https://github.com/FelixKrueger/TrimGalore/blob/master/Changelog.md#version-060-release-on-1-mar-2019 + // See: https://github.com/nf-core/atacseq/pull/65 + def cores = 1 + if (task.cpus) { + cores = (task.cpus as int) - 4 + if (params.single_end) cores = (task.cpus as int) - 3 + if (cores < 1) cores = 1 + if (cores > 4) cores = 4 + } + // Added soft-links to original fastqs for consistent naming in MultiQC c_r1 = params.clip_r1 > 0 ? "--clip_r1 ${params.clip_r1}" : '' c_r2 = params.clip_r2 > 0 ? "--clip_r2 ${params.clip_r2}" : '' tpc_r1 = params.three_prime_clip_r1 > 0 ? "--three_prime_clip_r1 ${params.three_prime_clip_r1}" : '' tpc_r2 = params.three_prime_clip_r2 > 0 ? "--three_prime_clip_r2 ${params.three_prime_clip_r2}" : '' nextseq = params.trim_nextseq > 0 ? "--nextseq ${params.trim_nextseq}" : '' + + // Added soft-links to original fastqs for consistent naming in MultiQC if (params.single_end) { """ [ ! -f ${name}.fastq.gz ] && ln -s $reads ${name}.fastq.gz - trim_galore --fastqc --gzip $c_r1 $tpc_r1 $nextseq ${name}.fastq.gz + trim_galore --cores $cores --fastqc --gzip $c_r1 $tpc_r1 $nextseq ${name}.fastq.gz """ } else { """ [ ! -f ${name}_1.fastq.gz ] && ln -s ${reads[0]} ${name}_1.fastq.gz [ ! -f ${name}_2.fastq.gz ] && ln -s ${reads[1]} ${name}_2.fastq.gz - trim_galore --paired --fastqc --gzip $c_r1 $c_r2 $tpc_r1 $tpc_r2 $nextseq ${name}_1.fastq.gz ${name}_2.fastq.gz + trim_galore --cores $cores --paired --fastqc --gzip $c_r1 $c_r2 $tpc_r1 $tpc_r2 $nextseq ${name}_1.fastq.gz ${name}_2.fastq.gz """ } } @@ -574,18 +620,18 @@ if (params.skip_trimming) { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 3.1 - Align read 1 with bwa + * STEP 3.1: Map read(s) with bwa mem */ -process BWAMem { +process BWA_MEM { tag "$name" label 'process_high' input: - set val(name), file(reads) from ch_trimmed_reads - file index from ch_bwa_index.collect() + tuple val(name), path(reads) from ch_trimmed_reads + path index from ch_bwa_index.collect() output: - set val(name), file("*.bam") into ch_bwa_bam + tuple val(name), path('*.bam') into ch_bwa_bam script: prefix = "${name}.Lb" @@ -593,11 +639,13 @@ process BWAMem { if (params.seq_center) { rg = "\'@RG\\tID:${name}\\tSM:${name.split('_')[0..-2].join('_')}\\tPL:ILLUMINA\\tLB:${name}\\tPU:1\\tCN:${params.seq_center}\'" } + score = params.bwa_min_score ? "-T ${params.bwa_min_score}" : '' """ bwa mem \\ -t $task.cpus \\ -M \\ -R $rg \\ + $score \\ ${index}/${bwa_base} \\ $reads \\ | samtools view -@ $task.cpus -b -h -F 0x0100 -O BAM -o ${prefix}.bam - @@ -605,27 +653,27 @@ process BWAMem { } /* - * STEP 3.2 - Convert .bam to coordinate sorted .bam + * STEP 3.2: Convert BAM to coordinate sorted BAM */ -process SortBAM { +process SORT_BAM { tag "$name" label 'process_medium' if (params.save_align_intermeds) { - publishDir path: "${params.outdir}/bwa/library", mode: 'copy', + publishDir path: "${params.outdir}/bwa/library", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".flagstat")) "samtools_stats/$filename" - else if (filename.endsWith(".idxstats")) "samtools_stats/$filename" - else if (filename.endsWith(".stats")) "samtools_stats/$filename" + if (filename.endsWith('.flagstat')) "samtools_stats/$filename" + else if (filename.endsWith('.idxstats')) "samtools_stats/$filename" + else if (filename.endsWith('.stats')) "samtools_stats/$filename" else filename } } input: - set val(name), file(bam) from ch_bwa_bam + tuple val(name), path(bam) from ch_bwa_bam output: - set val(name), file("*.sorted.{bam,bam.bai}") into ch_sort_bam_merge - file "*.{flagstat,idxstats,stats}" into ch_sort_bam_flagstat_mqc + tuple val(name), path('*.sorted.{bam,bam.bai}') into ch_sort_bam_merge + path '*.{flagstat,idxstats,stats}' into ch_sort_bam_flagstat_mqc script: prefix = "${name}.Lb" @@ -647,7 +695,7 @@ process SortBAM { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 4.1 Merge BAM files for all libraries from same replicate + * STEP 4.1: Merge BAM files for all libraries from same sample replicate */ ch_sort_bam_merge .map { it -> [ it[0].split('_')[0..-2].join('_'), it[1] ] } @@ -655,34 +703,34 @@ ch_sort_bam_merge .map { it -> [ it[0], it[1].flatten() ] } .set { ch_sort_bam_merge } -process MergedLibBAM { +process MERGED_LIB_BAM { tag "$name" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary", mode: 'copy', + publishDir "${params.outdir}/bwa/mergedLibrary", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".flagstat")) "samtools_stats/$filename" - else if (filename.endsWith(".idxstats")) "samtools_stats/$filename" - else if (filename.endsWith(".stats")) "samtools_stats/$filename" - else if (filename.endsWith(".metrics.txt")) "picard_metrics/$filename" + if (filename.endsWith('.flagstat')) "samtools_stats/$filename" + else if (filename.endsWith('.idxstats')) "samtools_stats/$filename" + else if (filename.endsWith('.stats')) "samtools_stats/$filename" + else if (filename.endsWith('.metrics.txt')) "picard_metrics/$filename" else params.save_align_intermeds ? filename : null } input: - set val(name), file(bams) from ch_sort_bam_merge + tuple val(name), path(bams) from ch_sort_bam_merge output: - set val(name), file("*${prefix}.sorted.{bam,bam.bai}") into ch_mlib_bam_filter, - ch_mlib_bam_preseq, - ch_mlib_bam_ataqv - file "*.{flagstat,idxstats,stats}" into ch_mlib_bam_stats_mqc - file "*.txt" into ch_mlib_bam_metrics_mqc + tuple val(name), path("*${prefix}.sorted.{bam,bam.bai}") into ch_mlib_bam_filter, + ch_mlib_bam_preseq, + ch_mlib_bam_ataqv + path '*.{flagstat,idxstats,stats}' into ch_mlib_bam_stats_mqc + path '*.txt' into ch_mlib_bam_metrics_mqc script: prefix = "${name}.mLb.mkD" bam_files = bams.findAll { it.toString().endsWith('.bam') }.sort() def avail_mem = 3 if (!task.memory) { - log.info "[Picard MarkDuplicates] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this." + log.info '[Picard MarkDuplicates] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this.' } else { avail_mem = task.memory.toGiga() } @@ -730,40 +778,40 @@ process MergedLibBAM { } /* - * STEP 4.2 Filter BAM file at merged library-level + * STEP 4.2: Filter BAM file at merged library-level */ -process MergedLibBAMFilter { +process MERGED_LIB_BAM_FILTER { tag "$name" label 'process_medium' - publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: 'copy', + publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: params.publish_dir_mode, saveAs: { filename -> if (params.single_end || params.save_align_intermeds) { - if (filename.endsWith(".flagstat")) "samtools_stats/$filename" - else if (filename.endsWith(".idxstats")) "samtools_stats/$filename" - else if (filename.endsWith(".stats")) "samtools_stats/$filename" - else if (filename.endsWith(".sorted.bam")) filename - else if (filename.endsWith(".sorted.bam.bai")) filename + if (filename.endsWith('.flagstat')) "samtools_stats/$filename" + else if (filename.endsWith('.idxstats')) "samtools_stats/$filename" + else if (filename.endsWith('.stats')) "samtools_stats/$filename" + else if (filename.endsWith('.sorted.bam')) filename + else if (filename.endsWith('.sorted.bam.bai')) filename else null } } input: - set val(name), file(bam) from ch_mlib_bam_filter - file bed from ch_genome_filter_regions.collect() - file bamtools_filter_config from ch_bamtools_filter_config + tuple val(name), path(bam) from ch_mlib_bam_filter + path bed from ch_genome_filter_regions.collect() + path bamtools_filter_config from ch_bamtools_filter_config output: - set val(name), file("*.{bam,bam.bai}") into ch_mlib_filter_bam - set val(name), file("*.flagstat") into ch_mlib_filter_bam_flagstat - file "*.{idxstats,stats}" into ch_mlib_filter_bam_stats_mqc + tuple val(name), path('*.{bam,bam.bai}') into ch_mlib_filter_bam + tuple val(name), path('*.flagstat') into ch_mlib_filter_bam_flagstat + path '*.{idxstats,stats}' into ch_mlib_filter_bam_stats_mqc script: prefix = params.single_end ? "${name}.mLb.clN" : "${name}.mLb.flT" - filter_params = params.single_end ? "-F 0x004" : "-F 0x004 -F 0x0008 -f 0x001" - dup_params = params.keep_dups ? "" : "-F 0x0400" - multimap_params = params.keep_multi_map ? "" : "-q 1" - blacklist_params = params.blacklist ? "-L $bed" : "" - name_sort_bam = params.single_end ? "" : "samtools sort -n -@ $task.cpus -o ${prefix}.bam -T $prefix ${prefix}.sorted.bam" + filter_params = params.single_end ? '-F 0x004' : '-F 0x004 -F 0x0008 -f 0x001' + dup_params = params.keep_dups ? '' : '-F 0x0400' + multimap_params = params.keep_multi_map ? '' : '-q 1' + blacklist_params = params.blacklist ? "-L $bed" : '' + name_sort_bam = params.single_end ? '' : "samtools sort -n -@ $task.cpus -o ${prefix}.bam -T $prefix ${prefix}.sorted.bam" """ samtools view \\ $filter_params \\ @@ -785,7 +833,7 @@ process MergedLibBAMFilter { } /* - * STEP 4.3 Remove orphan reads from paired-end BAM file + * STEP 4.3: Remove orphan reads from paired-end BAM file */ if (params.single_end) { ch_mlib_filter_bam @@ -805,34 +853,34 @@ if (params.single_end) { ch_mlib_filter_bam_stats_mqc .set { ch_mlib_rm_orphan_stats_mqc } } else { - process MergedLibBAMRemoveOrphan { + process MERGED_LIB_BAM_REMOVE_ORPHAN { tag "$name" label 'process_medium' - publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: 'copy', + publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".flagstat")) "samtools_stats/$filename" - else if (filename.endsWith(".idxstats")) "samtools_stats/$filename" - else if (filename.endsWith(".stats")) "samtools_stats/$filename" - else if (filename.endsWith(".sorted.bam")) filename - else if (filename.endsWith(".sorted.bam.bai")) filename + if (filename.endsWith('.flagstat')) "samtools_stats/$filename" + else if (filename.endsWith('.idxstats')) "samtools_stats/$filename" + else if (filename.endsWith('.stats')) "samtools_stats/$filename" + else if (filename.endsWith('.sorted.bam')) filename + else if (filename.endsWith('.sorted.bam.bai')) filename else null } input: - set val(name), file(bam) from ch_mlib_filter_bam + tuple val(name), path(bam) from ch_mlib_filter_bam output: - set val(name), file("*.sorted.{bam,bam.bai}") into ch_mlib_rm_orphan_bam_metrics, - ch_mlib_rm_orphan_bam_bigwig, - ch_mlib_rm_orphan_bam_macs, - ch_mlib_rm_orphan_bam_plotfingerprint, - ch_mlib_rm_orphan_bam_mrep - set val(name), file("${prefix}.bam") into ch_mlib_name_bam_mlib_counts, - ch_mlib_name_bam_mrep_counts - set val(name), file("*.flagstat") into ch_mlib_rm_orphan_flagstat_bigwig, - ch_mlib_rm_orphan_flagstat_macs, - ch_mlib_rm_orphan_flagstat_mqc - file "*.{idxstats,stats}" into ch_mlib_rm_orphan_stats_mqc + tuple val(name), path('*.sorted.{bam,bam.bai}') into ch_mlib_rm_orphan_bam_metrics, + ch_mlib_rm_orphan_bam_bigwig, + ch_mlib_rm_orphan_bam_macs, + ch_mlib_rm_orphan_bam_plotfingerprint, + ch_mlib_rm_orphan_bam_mrep + tuple val(name), path("${prefix}.bam") into ch_mlib_name_bam_mlib_counts, + ch_mlib_name_bam_mrep_counts + tuple val(name), path('*.flagstat') into ch_mlib_rm_orphan_flagstat_bigwig, + ch_mlib_rm_orphan_flagstat_macs, + ch_mlib_rm_orphan_flagstat_mqc + path '*.{idxstats,stats}' into ch_mlib_rm_orphan_stats_mqc script: // This script is bundled with the pipeline, in nf-core/atacseq/bin/ prefix = "${name}.mLb.clN" @@ -857,39 +905,49 @@ if (params.single_end) { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 5.1 preseq analysis after merging libraries and before filtering + * STEP 5.1: Preseq analysis after merging libraries and before filtering */ -process MergedLibPreseq { +process MERGED_LIB_PRESEQ { tag "$name" label 'process_low' - publishDir "${params.outdir}/bwa/mergedLibrary/preseq", mode: 'copy' + label 'error_ignore' + publishDir "${params.outdir}/bwa/mergedLibrary/preseq", mode: params.publish_dir_mode when: !params.skip_preseq input: - set val(name), file(bam) from ch_mlib_bam_preseq + tuple val(name), path(bam) from ch_mlib_bam_preseq output: - file "*.ccurve.txt" into ch_mlib_preseq_mqc + path '*.ccurve.txt' into ch_mlib_preseq_mqc + path '*.log' script: prefix = "${name}.mLb.mkD" + pe = params.single_end ? '' : '-pe' """ - preseq lc_extrap -v -output ${prefix}.ccurve.txt -bam ${bam[0]} + preseq lc_extrap \\ + -output ${prefix}.ccurve.txt \\ + -verbose \\ + -bam \\ + $pe \\ + -seed 1 \\ + ${bam[0]} + cp .command.err ${prefix}.command.log """ } /* - * STEP 5.2 Picard CollectMultipleMetrics after merging libraries and filtering + * STEP 5.2: Picard CollectMultipleMetrics after merging libraries and filtering */ -process MergedLibMetrics { +process MERGED_LIB_PICARD_METRICS { tag "$name" label 'process_medium' - publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: 'copy', + publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith("_metrics")) "picard_metrics/$filename" - else if (filename.endsWith(".pdf")) "picard_metrics/pdf/$filename" + if (filename.endsWith('_metrics')) "picard_metrics/$filename" + else if (filename.endsWith('.pdf')) "picard_metrics/pdf/$filename" else null } @@ -897,18 +955,18 @@ process MergedLibMetrics { !params.skip_picard_metrics input: - set val(name), file(bam) from ch_mlib_rm_orphan_bam_metrics - file fasta from ch_fasta + tuple val(name), path(bam) from ch_mlib_rm_orphan_bam_metrics + path fasta from ch_fasta output: - file "*_metrics" into ch_mlib_collectmetrics_mqc - file "*.pdf" into ch_mlib_collectmetrics_pdf + path '*_metrics' into ch_mlib_collectmetrics_mqc + path '*.pdf' script: prefix = "${name}.mLb.clN" def avail_mem = 3 if (!task.memory) { - log.info "[Picard MarkDuplicates] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this." + log.info '[Picard MarkDuplicates] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this.' } else { avail_mem = task.memory.toGiga() } @@ -923,35 +981,35 @@ process MergedLibMetrics { } /* - * STEP 5.3 Read depth normalised bigWig + * STEP 5.3: Read depth normalised bigWig */ -process MergedLibBigWig { +process MERGED_LIB_BIGWIG { tag "$name" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary/bigwig", mode: 'copy', + publishDir "${params.outdir}/bwa/mergedLibrary/bigwig", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith("scale_factor.txt")) "scale/$filename" - else if (filename.endsWith(".bigWig")) "$filename" + if (filename.endsWith('scale_factor.txt')) "scale/$filename" + else if (filename.endsWith('.bigWig')) filename else null } input: - set val(name), file(bam), file(flagstat) from ch_mlib_rm_orphan_bam_bigwig.join(ch_mlib_rm_orphan_flagstat_bigwig, by: [0]) - file sizes from ch_genome_sizes_mlib_bigwig.collect() + tuple val(name), path(bam), path(flagstat) from ch_mlib_rm_orphan_bam_bigwig.join(ch_mlib_rm_orphan_flagstat_bigwig, by: [0]) + path sizes from ch_genome_sizes_mlib_bigwig.collect() output: - set val(name), file("*.bigWig") into ch_mlib_bigwig_plotprofile - file "*scale_factor.txt" into ch_mlib_bigwig_scale - file "*igv.txt" into ch_mlib_bigwig_igv + tuple val(name), path('*.bigWig') into ch_mlib_bigwig_plotprofile + path '*igv.txt' into ch_mlib_bigwig_igv + path '*scale_factor.txt' script: prefix = "${name}.mLb.clN" - pe_fragment = params.single_end ? "" : "-pc" + pe_fragment = params.single_end ? '' : '-pc' extend = (params.single_end && params.fragment_size > 0) ? "-fs ${params.fragment_size}" : '' """ SCALE_FACTOR=\$(grep 'mapped (' $flagstat | awk '{print 1000000/\$1}') echo \$SCALE_FACTOR > ${prefix}.scale_factor.txt - genomeCoverageBed -ibam ${bam[0]} -bg -scale \$SCALE_FACTOR $pe_fragment $extend | sort -k1,1 -k2,2n > ${prefix}.bedGraph + genomeCoverageBed -ibam ${bam[0]} -bg -scale \$SCALE_FACTOR $pe_fragment $extend | sort -T '.' -k1,1 -k2,2n > ${prefix}.bedGraph bedGraphToBigWig ${prefix}.bedGraph $sizes ${prefix}.bigWig @@ -960,23 +1018,23 @@ process MergedLibBigWig { } /* - * STEP 5.4 generate gene body coverage plot with deepTools + * STEP 5.4: Generate gene body coverage plot with deepTools plotProfile and plotHeatmap */ -process MergedLibPlotProfile { +process MERGED_LIB_PLOTPROFILE { tag "$name" label 'process_high' - publishDir "${params.outdir}/bwa/mergedLibrary/deepTools/plotProfile", mode: 'copy' + publishDir "${params.outdir}/bwa/mergedLibrary/deepTools/plotProfile", mode: params.publish_dir_mode when: !params.skip_plot_profile input: - set val(name), file(bigwig) from ch_mlib_bigwig_plotprofile - file bed from ch_gene_bed + tuple val(name), path(bigwig) from ch_mlib_bigwig_plotprofile + path bed from ch_gene_bed output: - file '*.{gz,pdf}' into ch_mlib_plotprofile_results - file '*.plotProfile.tab' into ch_mlib_plotprofile_mqc + path '*.plotProfile.tab' into ch_mlib_plotprofile_mqc + path '*.{gz,pdf,mat.tab}' script: prefix = "${name}.mLb.clN" @@ -985,7 +1043,7 @@ process MergedLibPlotProfile { --regionsFileName $bed \\ --scoreFileName $bigwig \\ --outFileName ${prefix}.computeMatrix.mat.gz \\ - --outFileNameMatrix ${prefix}.computeMatrix.vals.mat.gz \\ + --outFileNameMatrix ${prefix}.computeMatrix.vals.mat.tab \\ --regionBodyLength 1000 \\ --beforeRegionStartLength 3000 \\ --afterRegionStartLength 3000 \\ @@ -996,26 +1054,30 @@ process MergedLibPlotProfile { plotProfile --matrixFile ${prefix}.computeMatrix.mat.gz \\ --outFileName ${prefix}.plotProfile.pdf \\ --outFileNameData ${prefix}.plotProfile.tab + + plotHeatmap --matrixFile ${prefix}.computeMatrix.mat.gz \\ + --outFileName ${prefix}.plotHeatmap.pdf \\ + --outFileNameMatrix ${prefix}.plotHeatmap.mat.tab """ } /* - * STEP 5.5 deepTools plotFingerprint + * STEP 5.5: deepTools plotFingerprint */ -process MergedLibPlotFingerprint { +process MERGED_LIB_PLOTFINGERPRINT { tag "$name" label 'process_high' - publishDir "${params.outdir}/bwa/mergedLibrary/deepTools/plotFingerprint", mode: 'copy' + publishDir "${params.outdir}/bwa/mergedLibrary/deepTools/plotFingerprint", mode: params.publish_dir_mode when: !params.skip_plot_fingerprint input: - set val(name), file(bam) from ch_mlib_rm_orphan_bam_plotfingerprint + tuple val(name), path(bam) from ch_mlib_rm_orphan_bam_plotfingerprint output: - file '*.{txt,pdf}' into ch_mlib_plotfingerprint_results - file '*.raw.txt' into ch_mlib_plotfingerprint_mqc + path '*.raw.txt' into ch_mlib_plotfingerprint_mqc + path '*.{txt,pdf}' script: prefix = "${name}.mLb.clN" @@ -1043,15 +1105,15 @@ process MergedLibPlotFingerprint { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 6.1 Call peaks with MACS2 and calculate FRiP score + * STEP 6.1: Call peaks with MACS2 and calculate FRiP score */ -process MergedLibMACSCallPeak { +process MERGED_LIB_MACS2 { tag "$name" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}", mode: 'copy', + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".tsv")) "qc/$filename" - else if (filename.endsWith(".igv.txt")) null + if (filename.endsWith('.tsv')) "qc/$filename" + else if (filename.endsWith('.igv.txt')) null else filename } @@ -1059,24 +1121,26 @@ process MergedLibMACSCallPeak { params.macs_gsize input: - set val(name), file(bam), file(flagstat) from ch_mlib_rm_orphan_bam_macs.join(ch_mlib_rm_orphan_flagstat_macs, by: [0]) - file mlib_peak_count_header from ch_mlib_peak_count_header - file mlib_frip_score_header from ch_mlib_frip_score_header + tuple val(name), path(bam), path(flagstat) from ch_mlib_rm_orphan_bam_macs.join(ch_mlib_rm_orphan_flagstat_macs, by: [0]) + path mlib_peak_count_header from ch_mlib_peak_count_header + path mlib_frip_score_header from ch_mlib_frip_score_header output: - set val(name), file("*.{bed,xls,gappedPeak,bdg}") into ch_mlib_macs_output - set val(name), file("*$PEAK_TYPE") into ch_mlib_macs_homer, - ch_mlib_macs_qc, - ch_mlib_macs_consensus, - ch_mlib_macs_ataqv - file "*igv.txt" into ch_mlib_macs_igv - file "*_mqc.tsv" into ch_mlib_macs_mqc + tuple val(name), path("*$PEAK_TYPE") into ch_mlib_macs_homer, + ch_mlib_macs_qc, + ch_mlib_macs_consensus, + ch_mlib_macs_ataqv + path '*igv.txt' into ch_mlib_macs_igv + path '*_mqc.tsv' into ch_mlib_macs_mqc + path '*.{bed,xls,gappedPeak,bdg}' script: prefix = "${name}.mLb.clN" broad = params.narrow_peak ? '' : "--broad --broad-cutoff ${params.broad_cutoff}" - format = params.single_end ? "BAM" : "BAMPE" - pileup = params.save_macs_pileup ? "-B --SPMR" : "" + format = params.single_end ? 'BAM' : 'BAMPE' + pileup = params.save_macs_pileup ? '-B --SPMR' : '' + fdr = params.macs_fdr ? "--qvalue ${params.macs_fdr}" : '' + pvalue = params.macs_pvalue ? "--pvalue ${params.macs_pvalue}" : '' """ macs2 callpeak \\ -t ${bam[0]} \\ @@ -1085,6 +1149,8 @@ process MergedLibMACSCallPeak { -g $params.macs_gsize \\ -n $prefix \\ $pileup \\ + $fdr \\ + $pvalue \\ --keep-dup all \\ --nomodel @@ -1098,23 +1164,23 @@ process MergedLibMACSCallPeak { } /* - * STEP 6.2 Annotate peaks with HOMER + * STEP 6.2: Annotate peaks with HOMER */ -process MergedLibAnnotatePeaks { +process MERGED_LIB_MACS2_ANNOTATE { tag "$name" - label "process_medium" - publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}", mode: 'copy' + label 'process_medium' + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}", mode: params.publish_dir_mode when: - params.macs_gsize + params.macs_gsize && !params.skip_peak_annotation input: - set val(name), file(peak) from ch_mlib_macs_homer - file fasta from ch_fasta - file gtf from ch_gtf + tuple val(name), path(peak) from ch_mlib_macs_homer + path fasta from ch_fasta + path gtf from ch_gtf output: - file "*.txt" into ch_mlib_macs_annotate + path '*.txt' into ch_mlib_macs_annotate script: prefix = "${name}.mLb.clN" @@ -1130,23 +1196,23 @@ process MergedLibAnnotatePeaks { } /* - * STEP 6.3 Aggregated QC plots for peaks, FRiP and peak-to-gene annotation + * STEP 6.3: Aggregated QC plots for peaks, FRiP and peak-to-gene annotation */ -process MergedLibPeakQC { - label "process_medium" - publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/qc", mode: 'copy' +process MERGED_LIB_MACS2_QC { + label 'process_medium' + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/qc", mode: params.publish_dir_mode when: - params.macs_gsize + params.macs_gsize && !params.skip_peak_annotation && !params.skip_peak_qc input: - file peaks from ch_mlib_macs_qc.collect{ it[1] } - file annos from ch_mlib_macs_annotate.collect() - file mlib_peak_annotation_header from ch_mlib_peak_annotation_header + path peaks from ch_mlib_macs_qc.collect{ it[1] } + path annos from ch_mlib_macs_annotate.collect() + path mlib_peak_annotation_header from ch_mlib_peak_annotation_header output: - file "*.{txt,pdf}" into ch_mlib_peak_qc - file "*.tsv" into ch_mlib_peak_qc_mqc + path '*.tsv' into ch_mlib_peak_qc_mqc + path '*.{txt,pdf}' script: // This script is bundled with the pipeline, in nf-core/atacseq/bin/ suffix = 'mLb.clN' @@ -1168,37 +1234,37 @@ process MergedLibPeakQC { } /* - * STEP 6.4 Consensus peaks across samples, create boolean filtering file, .saf file for featureCounts and UpSetR plot for intersection + * STEP 6.4: Consensus peaks across samples, create boolean filtering file, SAF file for featureCounts and UpSetR plot for intersection */ -process MergedLibConsensusPeakSet { +process MERGED_LIB_CONSENSUS { label 'process_long' - publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus", mode: 'copy', + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".igv.txt")) null + if (filename.endsWith('.igv.txt')) null else filename } when: - params.macs_gsize && (replicatesExist || multipleGroups) + params.macs_gsize && (replicatesExist || multipleGroups) && !params.skip_consensus_peaks input: - file peaks from ch_mlib_macs_consensus.collect{ it[1] } + path peaks from ch_mlib_macs_consensus.collect{ it[1] } output: - file "*.bed" into ch_mlib_macs_consensus_bed - file "*.saf" into ch_mlib_macs_consensus_saf - file "*.boolean.txt" into ch_mlib_macs_consensus_bool - file "*.intersect.{txt,plot.pdf}" into ch_mlib_macs_consensus_intersect - file "*igv.txt" into ch_mlib_macs_consensus_igv + path '*.bed' into ch_mlib_macs_consensus_bed + path '*.saf' into ch_mlib_macs_consensus_saf + path '*.boolean.txt' into ch_mlib_macs_consensus_bool + path '*igv.txt' into ch_mlib_macs_consensus_igv + path '*.intersect.{txt,plot.pdf}' script: // scripts are bundled with the pipeline, in nf-core/atacseq/bin/ suffix = 'mLb.clN' prefix = "consensus_peaks.${suffix}" mergecols = params.narrow_peak ? (2..10).join(',') : (2..9).join(',') - collapsecols = params.narrow_peak ? (["collapse"]*9).join(',') : (["collapse"]*8).join(',') - expandparam = params.narrow_peak ? "--is_narrow_peak" : "" + collapsecols = params.narrow_peak ? (['collapse']*9).join(',') : (['collapse']*8).join(',') + expandparam = params.narrow_peak ? '--is_narrow_peak' : '' """ - sort -k1,1 -k2,2n ${peaks.collect{it.toString()}.sort().join(' ')} \\ + sort -T '.' -k1,1 -k2,2n ${peaks.collect{it.toString()}.sort().join(' ')} \\ | mergeBed -c $mergecols -o $collapsecols > ${prefix}.txt macs2_merged_expand.py \\ @@ -1221,26 +1287,26 @@ process MergedLibConsensusPeakSet { } /* - * STEP 6.5 Annotate consensus peaks with HOMER, and add annotation to boolean output file + * STEP 6.5: Annotate consensus peaks with HOMER, and add annotation to boolean output file */ -process MergedLibConsensusPeakSetAnnotate { - label "process_medium" - publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus", mode: 'copy' +process MERGED_LIB_CONSENSUS_ANNOTATE { + label 'process_medium' + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus", mode: params.publish_dir_mode when: - params.macs_gsize && (replicatesExist || multipleGroups) + params.macs_gsize && (replicatesExist || multipleGroups) && !params.skip_consensus_peaks && !params.skip_peak_annotation input: - file bed from ch_mlib_macs_consensus_bed - file bool from ch_mlib_macs_consensus_bool - file fasta from ch_fasta - file gtf from ch_gtf + path bed from ch_mlib_macs_consensus_bed + path bool from ch_mlib_macs_consensus_bool + path fasta from ch_fasta + path gtf from ch_gtf output: - file "*.annotatePeaks.txt" into ch_mlib_macs_consensus_annotate + path '*.annotatePeaks.txt' script: - prefix = "consensus_peaks.mLb.clN" + prefix = 'consensus_peaks.mLb.clN' """ annotatePeaks.pl \\ $bed \\ @@ -1250,46 +1316,33 @@ process MergedLibConsensusPeakSetAnnotate { -cpu $task.cpus \\ > ${prefix}.annotatePeaks.txt - cut -f2- ${prefix}.annotatePeaks.txt | awk 'NR==1; NR > 1 {print \$0 | "sort -k1,1 -k2,2n"}' | cut -f6- > tmp.txt + cut -f2- ${prefix}.annotatePeaks.txt | awk 'NR==1; NR > 1 {print \$0 | "sort -T '.' -k1,1 -k2,2n"}' | cut -f6- > tmp.txt paste $bool tmp.txt > ${prefix}.boolean.annotatePeaks.txt """ } /* - * STEP 6.6 Count reads in consensus peaks with featureCounts and perform differential analysis with DESeq2 + * STEP 6.6: Count reads in consensus peaks with featureCounts */ -process MergedLibConsensusPeakSetDESeq { +process MERGED_LIB_CONSENSUS_COUNTS { label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus/deseq2", mode: 'copy', - saveAs: { filename -> - if (filename.endsWith(".igv.txt")) null - else filename - } + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus", mode: params.publish_dir_mode when: - params.macs_gsize && replicatesExist && multipleGroups && !params.skip_diff_analysis + params.macs_gsize && (replicatesExist || multipleGroups) && !params.skip_consensus_peaks input: - file bams from ch_mlib_name_bam_mlib_counts.collect{ it[1] } - file saf from ch_mlib_macs_consensus_saf.collect() - file mlib_deseq2_pca_header from ch_mlib_deseq2_pca_header - file mlib_deseq2_clustering_header from ch_mlib_deseq2_clustering_header + path bams from ch_mlib_name_bam_mlib_counts.collect{ it[1] } + path saf from ch_mlib_macs_consensus_saf.collect() output: - file "*featureCounts.txt" into ch_mlib_macs_consensus_counts - file "*featureCounts.txt.summary" into ch_mlib_macs_consensus_counts_mqc - file "*.{RData,results.txt,pdf,log}" into ch_mlib_macs_consensus_deseq_results - file "sizeFactors" into ch_mlib_macs_consensus_deseq_factors - file "*vs*/*.{pdf,txt}" into ch_mlib_macs_consensus_deseq_comp_results - file "*vs*/*.bed" into ch_mlib_macs_consensus_deseq_comp_bed - file "*igv.txt" into ch_mlib_macs_consensus_deseq_comp_igv - file "*.tsv" into ch_mlib_macs_consensus_deseq_mqc + path '*featureCounts.txt' into ch_mlib_macs_consensus_counts + path '*featureCounts.txt.summary' into ch_mlib_macs_consensus_counts_mqc script: - prefix = "consensus_peaks.mLb.clN" + prefix = 'consensus_peaks.mLb.clN' bam_files = bams.findAll { it.toString().endsWith('.bam') }.sort() - bam_ext = params.single_end ? ".mLb.clN.sorted.bam" : ".mLb.clN.bam" - pe_params = params.single_end ? '' : "-p --donotsort" + pe_params = params.single_end ? '' : '-p --donotsort' """ featureCounts \\ -F SAF \\ @@ -1300,8 +1353,49 @@ process MergedLibConsensusPeakSetDESeq { -a $saf \\ -o ${prefix}.featureCounts.txt \\ ${bam_files.join(' ')} + """ +} - featurecounts_deseq2.r -i ${prefix}.featureCounts.txt -b '$bam_ext' -o ./ -p $prefix -s .mLb +/* + * STEP 6.7: Differential analysis with DESeq2 + */ +process MERGED_LIB_CONSENSUS_DESEQ2 { + label 'process_medium' + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus/deseq2", mode: params.publish_dir_mode, + saveAs: { filename -> + if (filename.endsWith('.igv.txt')) null + else filename + } + + when: + params.macs_gsize && replicatesExist && multipleGroups && !params.skip_consensus_peaks && !params.skip_diff_analysis + + input: + path counts from ch_mlib_macs_consensus_counts + path mlib_deseq2_pca_header from ch_mlib_deseq2_pca_header + path mlib_deseq2_clustering_header from ch_mlib_deseq2_clustering_header + + output: + path '*.tsv' into ch_mlib_macs_consensus_deseq_mqc + path '*igv.txt' into ch_mlib_macs_consensus_deseq_comp_igv + path '*.{RData,results.txt,pdf,log}' + path 'sizeFactors' + path '*vs*/*.{pdf,txt}' + path '*vs*/*.bed' + + script: + prefix = 'consensus_peaks.mLb.clN' + bam_ext = params.single_end ? '.mLb.clN.sorted.bam' : '.mLb.clN.bam' + vst = params.deseq2_vst ? '--vst TRUE' : '' + """ + featurecounts_deseq2.r \\ + --featurecount_file $counts \\ + --bam_suffix '$bam_ext' \\ + --outdir ./ \\ + --outprefix $prefix \\ + --outsuffix .mLb.clN \\ + --cores $task.cpus \\ + $vst \\ cat $mlib_deseq2_pca_header ${prefix}.pca.vals.txt > ${prefix}.pca.vals_mqc.tsv cat $mlib_deseq2_clustering_header ${prefix}.sample.dists.txt > ${prefix}.sample.dists_mqc.tsv @@ -1311,33 +1405,34 @@ process MergedLibConsensusPeakSetDESeq { } /* - * STEP 6.7 Run ataqv on BAM file and corresponding peaks + * STEP 6.8: Run ataqv on BAM file and corresponding peaks */ -process MergedLibAtaqv { +process MERGED_LIB_ATAQV { tag "$name" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary/ataqv/${PEAK_TYPE}", mode: 'copy' + publishDir "${params.outdir}/bwa/mergedLibrary/ataqv/${PEAK_TYPE}", mode: params.publish_dir_mode when: !params.skip_ataqv input: - set val(name), file(bam), file(peak) from ch_mlib_bam_ataqv.join(ch_mlib_macs_ataqv, by: [0]) - file autosomes from ch_genome_autosomes.collect() - file tss_bed from ch_tss_bed + tuple val(name), path(bam), path(peak) from ch_mlib_bam_ataqv.join(ch_mlib_macs_ataqv, by: [0]) + path autosomes from ch_genome_autosomes.collect() + path tss_bed from ch_tss_bed output: - file "*.json" into ch_mlib_ataqv + path '*.json' into ch_mlib_ataqv script: - peak_param = params.macs_gsize ? "--peak-file ${peak}" : "" - mito_param = params.mito_name ? "--mitochondrial-reference-name ${params.mito_name}" : "" + suffix = 'mLb.clN' + peak_param = params.macs_gsize ? "--peak-file ${peak}" : '' + mito_param = params.mito_name ? "--mitochondrial-reference-name ${params.mito_name}" : '' """ ataqv \\ --threads $task.cpus \\ $peak_param \\ --tss-file $tss_bed \\ - --metrics-file ${name}.ataqv.json \\ + --metrics-file ${name}.${suffix}.ataqv.json \\ --name $name \\ --ignore-read-groups \\ --autosomal-reference-file $autosomes \\ @@ -1348,20 +1443,20 @@ process MergedLibAtaqv { } /* - * STEP 6.8 run ataqv mkarv on all JSON files to render web app + * STEP 6.9: Run ataqv mkarv on all JSON files to render web app */ -process MergedLibAtaqvMkarv { +process MERGED_LIB_ATAQV_MKARV { label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary/ataqv/${PEAK_TYPE}", mode: 'copy' + publishDir "${params.outdir}/bwa/mergedLibrary/ataqv/${PEAK_TYPE}", mode: params.publish_dir_mode when: !params.skip_ataqv input: - file json from ch_mlib_ataqv.collect() + path json from ch_mlib_ataqv.collect() output: - file "html" into ch_mlib_ataqv_mkarv + path 'html' script: """ @@ -1382,7 +1477,7 @@ process MergedLibAtaqvMkarv { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 7 Merge library BAM files across all replicates + * STEP 7: Merge library BAM files across all replicates */ ch_mlib_rm_orphan_bam_mrep .map { it -> [ it[0].split('_')[0..-2].join('_'), it[1] ] } @@ -1390,29 +1485,29 @@ ch_mlib_rm_orphan_bam_mrep .map { it -> [ it[0], it[1].flatten() ] } .set { ch_mlib_rm_orphan_bam_mrep } -process MergedRepBAM { +process MERGED_REP_BAM { tag "$name" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedReplicate", mode: 'copy', + publishDir "${params.outdir}/bwa/mergedReplicate", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".flagstat")) "samtools_stats/$filename" - else if (filename.endsWith(".idxstats")) "samtools_stats/$filename" - else if (filename.endsWith(".stats")) "samtools_stats/$filename" - else if (filename.endsWith(".metrics.txt")) "picard_metrics/$filename" + if (filename.endsWith('.flagstat')) "samtools_stats/$filename" + else if (filename.endsWith('.idxstats')) "samtools_stats/$filename" + else if (filename.endsWith('.stats')) "samtools_stats/$filename" + else if (filename.endsWith('.metrics.txt')) "picard_metrics/$filename" else filename } input: - set val(name), file(bams) from ch_mlib_rm_orphan_bam_mrep + tuple val(name), path(bams) from ch_mlib_rm_orphan_bam_mrep output: - set val(name), file("*${prefix}.sorted.{bam,bam.bai}") into ch_mrep_bam_bigwig, - ch_mrep_bam_macs - set val(name), file("*.flagstat") into ch_mrep_bam_flagstat_bigwig, - ch_mrep_bam_flagstat_macs, - ch_mrep_bam_flagstat_mqc - file "*.{idxstats,stats}" into ch_mrep_bam_stats_mqc - file "*.txt" into ch_mrep_bam_metrics_mqc + tuple val(name), path("*${prefix}.sorted.{bam,bam.bai}") into ch_mrep_bam_bigwig, + ch_mrep_bam_macs + tuple val(name), path('*.flagstat') into ch_mrep_bam_flagstat_bigwig, + ch_mrep_bam_flagstat_macs, + ch_mrep_bam_flagstat_mqc + path '*.{idxstats,stats}' into ch_mrep_bam_stats_mqc + path '*.txt' into ch_mrep_bam_metrics_mqc when: !params.skip_merge_replicates && replicatesExist @@ -1422,7 +1517,7 @@ process MergedRepBAM { bam_files = bams.findAll { it.toString().endsWith('.bam') }.sort() def avail_mem = 3 if (!task.memory) { - log.info "[Picard MarkDuplicates] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this." + log.info '[Picard MarkDuplicates] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this.' } else { avail_mem = task.memory.toGiga() } @@ -1471,15 +1566,15 @@ process MergedRepBAM { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 8.1 Read depth normalised bigWig + * STEP 8.1: Read depth normalised bigWig */ -process MergedRepBigWig { +process MERGED_REP_BIGWIG { tag "$name" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedReplicate/bigwig", mode: 'copy', + publishDir "${params.outdir}/bwa/mergedReplicate/bigwig", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith("scale_factor.txt")) "scale/$filename" - else if (filename.endsWith(".bigWig")) "$filename" + if (filename.endsWith('scale_factor.txt')) "scale/$filename" + else if (filename.endsWith('.bigWig')) filename else null } @@ -1487,22 +1582,22 @@ process MergedRepBigWig { !params.skip_merge_replicates && replicatesExist input: - set val(name), file(bam), file(flagstat) from ch_mrep_bam_bigwig.join(ch_mrep_bam_flagstat_bigwig, by: [0]) - file sizes from ch_genome_sizes_mrep_bigwig.collect() + tuple val(name), path(bam), path(flagstat) from ch_mrep_bam_bigwig.join(ch_mrep_bam_flagstat_bigwig, by: [0]) + path sizes from ch_genome_sizes_mrep_bigwig.collect() output: - set val(name), file("*.bigWig") into ch_mrep_bigwig - file "*scale_factor.txt" into ch_mrep_bigwig_scale - file "*igv.txt" into ch_mrep_bigwig_igv + tuple val(name), path('*.bigWig') into ch_mrep_bigwig + path '*igv.txt' into ch_mrep_bigwig_igv + path '*scale_factor.txt' script: prefix = "${name}.mRp.clN" - pe_fragment = params.single_end ? "" : "-pc" + pe_fragment = params.single_end ? '' : '-pc' extend = (params.single_end && params.fragment_size > 0) ? "-fs ${params.fragment_size}" : '' """ SCALE_FACTOR=\$(grep 'mapped (' $flagstat | awk '{print 1000000/\$1}') echo \$SCALE_FACTOR > ${prefix}.scale_factor.txt - genomeCoverageBed -ibam ${bam[0]} -bg -scale \$SCALE_FACTOR $pe_fragment $extend | sort -k1,1 -k2,2n > ${prefix}.bedGraph + genomeCoverageBed -ibam ${bam[0]} -bg -scale \$SCALE_FACTOR $pe_fragment $extend | sort -T '.' -k1,1 -k2,2n > ${prefix}.bedGraph bedGraphToBigWig ${prefix}.bedGraph $sizes ${prefix}.bigWig @@ -1511,15 +1606,15 @@ process MergedRepBigWig { } /* - * STEP 8.2 Call peaks with MACS2 and calculate FRiP score + * STEP 8.2: Call peaks with MACS2 and calculate FRiP score */ -process MergedRepMACSCallPeak { +process MERGED_REP_MACS2 { tag "$name" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}", mode: 'copy', + publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".tsv")) "qc/$filename" - else if (filename.endsWith(".igv.txt")) null + if (filename.endsWith('.tsv')) "qc/$filename" + else if (filename.endsWith('.igv.txt')) null else filename } @@ -1527,23 +1622,23 @@ process MergedRepMACSCallPeak { !params.skip_merge_replicates && replicatesExist && params.macs_gsize input: - set val(name), file(bam), file(flagstat) from ch_mrep_bam_macs.join(ch_mrep_bam_flagstat_macs, by: [0]) - file mrep_peak_count_header from ch_mrep_peak_count_header - file mrep_frip_score_header from ch_mrep_frip_score_header + tuple val(name), path(bam), path(flagstat) from ch_mrep_bam_macs.join(ch_mrep_bam_flagstat_macs, by: [0]) + path mrep_peak_count_header from ch_mrep_peak_count_header + path mrep_frip_score_header from ch_mrep_frip_score_header output: - set val(name), file("*.{bed,xls,gappedPeak,bdg}") into ch_mrep_macs_output - set val(name), file("*$PEAK_TYPE") into ch_mrep_macs_homer, - ch_mrep_macs_qc, - ch_mrep_macs_consensus - file "*igv.txt" into ch_mrep_macs_igv - file "*_mqc.tsv" into ch_mrep_macs_mqc + tuple val(name), path("*$PEAK_TYPE") into ch_mrep_macs_homer, + ch_mrep_macs_qc, + ch_mrep_macs_consensus + path '*igv.txt' into ch_mrep_macs_igv + path '*_mqc.tsv' into ch_mrep_macs_mqc + path '*.{bed,xls,gappedPeak,bdg}' script: prefix = "${name}.mRp.clN" broad = params.narrow_peak ? '' : "--broad --broad-cutoff ${params.broad_cutoff}" - format = params.single_end ? "BAM" : "BAMPE" - pileup = params.save_macs_pileup ? "-B --SPMR" : "" + format = params.single_end ? 'BAM' : 'BAMPE' + pileup = params.save_macs_pileup ? '-B --SPMR' : '' """ macs2 callpeak \\ -t ${bam[0]} \\ @@ -1565,23 +1660,23 @@ process MergedRepMACSCallPeak { } /* - * STEP 8.3 Annotate peaks with HOMER + * STEP 8.3: Annotate peaks with HOMER */ -process MergedRepAnnotatePeaks { +process MERGED_REP_MACS2_ANNOTATE { tag "$name" - label "process_medium" - publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}", mode: 'copy' + label 'process_medium' + publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}", mode: params.publish_dir_mode when: - !params.skip_merge_replicates && replicatesExist && params.macs_gsize + !params.skip_merge_replicates && replicatesExist && params.macs_gsize && !params.skip_peak_annotation input: - set val(name), file(peak) from ch_mrep_macs_homer - file fasta from ch_fasta - file gtf from ch_gtf + tuple val(name), path(peak) from ch_mrep_macs_homer + path fasta from ch_fasta + path gtf from ch_gtf output: - file "*.txt" into ch_mrep_macs_annotate + path '*.txt' into ch_mrep_macs_annotate script: prefix = "${name}.mRp.clN" @@ -1597,23 +1692,23 @@ process MergedRepAnnotatePeaks { } /* - * STEP 8.4 Aggregated QC plots for peaks, FRiP and peak-to-gene annotation + * STEP 8.4: Aggregated QC plots for peaks, FRiP and peak-to-gene annotation */ -process MergedRepPeakQC { - label "process_medium" - publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}/qc", mode: 'copy' +process MERGED_REP_MACS2_QC { + label 'process_medium' + publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}/qc", mode: params.publish_dir_mode when: - !params.skip_merge_replicates && replicatesExist && params.macs_gsize + !params.skip_merge_replicates && replicatesExist && params.macs_gsize && !params.skip_peak_qc && !params.skip_peak_annotation input: - file peaks from ch_mrep_macs_qc.collect{ it[1] } - file annos from ch_mrep_macs_annotate.collect() - file mrep_peak_annotation_header from ch_mrep_peak_annotation_header + path peaks from ch_mrep_macs_qc.collect{ it[1] } + path annos from ch_mrep_macs_annotate.collect() + path mrep_peak_annotation_header from ch_mrep_peak_annotation_header output: - file "*.{txt,pdf}" into ch_mrep_peak_qc - file "*.tsv" into ch_mrep_peak_qc_mqc + path '*.tsv' into ch_mrep_peak_qc_mqc + path '*.{txt,pdf}' script: // This script is bundled with the pipeline, in nf-core/atacseq/bin/ suffix = 'mRp.clN' @@ -1635,37 +1730,37 @@ process MergedRepPeakQC { } /* - * STEP 8.5 Consensus peaks across samples, create boolean filtering file, .saf file for featureCounts and UpSetR plot for intersection + * STEP 8.5: Consensus peaks across samples, create boolean filtering file, SAF file for featureCounts and UpSetR plot for intersection */ -process MergedRepConsensusPeakSet { +process MERGED_REP_CONSENSUS { label 'process_long' - publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}/consensus", mode: 'copy', + publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}/consensus", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".igv.txt")) null + if (filename.endsWith('.igv.txt')) null else filename } when: - !params.skip_merge_replicates && replicatesExist && params.macs_gsize && multipleGroups + !params.skip_merge_replicates && replicatesExist && params.macs_gsize && multipleGroups && !params.skip_consensus_peaks input: - file peaks from ch_mrep_macs_consensus.collect{ it[1] } + path peaks from ch_mrep_macs_consensus.collect{ it[1] } output: - file "*.bed" into ch_mrep_macs_consensus_bed - file "*.saf" into ch_mrep_macs_consensus_saf - file "*.boolean.txt" into ch_mrep_macs_consensus_bool - file "*.intersect.{txt,plot.pdf}" into ch_mrep_macs_consensus_intersect - file "*igv.txt" into ch_mrep_macs_consensus_igv + path '*.bed' into ch_mrep_macs_consensus_bed + path '*.saf' into ch_mrep_macs_consensus_saf + path '*.boolean.txt' into ch_mrep_macs_consensus_bool + path '*igv.txt' into ch_mrep_macs_consensus_igv + path '*.intersect.{txt,plot.pdf}' script: // scripts are bundled with the pipeline, in nf-core/atacseq/bin/ suffix = 'mRp.clN' prefix = "consensus_peaks.${suffix}" mergecols = params.narrow_peak ? (2..10).join(',') : (2..9).join(',') - collapsecols = params.narrow_peak ? (["collapse"]*9).join(',') : (["collapse"]*8).join(',') - expandparam = params.narrow_peak ? "--is_narrow_peak" : "" + collapsecols = params.narrow_peak ? (['collapse']*9).join(',') : (['collapse']*8).join(',') + expandparam = params.narrow_peak ? '--is_narrow_peak' : '' """ - sort -k1,1 -k2,2n ${peaks.collect{it.toString()}.sort().join(' ')} \\ + sort -T '.' -k1,1 -k2,2n ${peaks.collect{it.toString()}.sort().join(' ')} \\ | mergeBed -c $mergecols -o $collapsecols > ${prefix}.txt macs2_merged_expand.py \\ @@ -1687,26 +1782,26 @@ process MergedRepConsensusPeakSet { } /* - * STEP 8.6 Annotate consensus peaks with HOMER, and add annotation to boolean output file + * STEP 8.6: Annotate consensus peaks with HOMER, and add annotation to boolean output file */ -process MergedRepConsensusPeakSetAnnotate { - label "process_medium" - publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}/consensus", mode: 'copy' +process MERGED_REP_CONSENSUS_ANNOTATE { + label 'process_medium' + publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}/consensus", mode: params.publish_dir_mode when: - !params.skip_merge_replicates && replicatesExist && params.macs_gsize && multipleGroups + !params.skip_merge_replicates && replicatesExist && params.macs_gsize && multipleGroups && !params.skip_consensus_peaks && !params.skip_peak_annotation input: - file bed from ch_mrep_macs_consensus_bed - file bool from ch_mrep_macs_consensus_bool - file fasta from ch_fasta - file gtf from ch_gtf + path bed from ch_mrep_macs_consensus_bed + path bool from ch_mrep_macs_consensus_bool + path fasta from ch_fasta + path gtf from ch_gtf output: - file "*.annotatePeaks.txt" into ch_mrep_macs_consensus_annotate + path '*.annotatePeaks.txt' script: - prefix = "consensus_peaks.mRp.clN" + prefix = 'consensus_peaks.mRp.clN' """ annotatePeaks.pl \\ $bed \\ @@ -1716,46 +1811,33 @@ process MergedRepConsensusPeakSetAnnotate { -cpu $task.cpus \\ > ${prefix}.annotatePeaks.txt - cut -f2- ${prefix}.annotatePeaks.txt | awk 'NR==1; NR > 1 {print \$0 | "sort -k1,1 -k2,2n"}' | cut -f6- > tmp.txt + cut -f2- ${prefix}.annotatePeaks.txt | awk 'NR==1; NR > 1 {print \$0 | "sort -T '.' -k1,1 -k2,2n"}' | cut -f6- > tmp.txt paste $bool tmp.txt > ${prefix}.boolean.annotatePeaks.txt """ } /* - * STEP 8.7 Count reads in consensus peaks with featureCounts and perform differential analysis with DESeq2 + * STEP 8.7: Count reads in consensus peaks with featureCounts */ -process MergedRepConsensusPeakSetDESeq { +process MERGED_REP_CONSENSUS_COUNTS { label 'process_medium' - publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}/consensus/deseq2", mode: 'copy', - saveAs: { filename -> - if (filename.endsWith(".igv.txt")) null - else filename - } + publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}/consensus", mode: params.publish_dir_mode when: - !params.skip_merge_replicates && replicatesExist && params.macs_gsize && multipleGroups && !params.skip_diff_analysis + !params.skip_merge_replicates && replicatesExist && params.macs_gsize && multipleGroups && !params.skip_consensus_peaks input: - file bams from ch_mlib_name_bam_mrep_counts.collect{ it[1] } - file saf from ch_mrep_macs_consensus_saf.collect() - file mrep_deseq2_pca_header from ch_mrep_deseq2_pca_header - file mrep_deseq2_clustering_header from ch_mrep_deseq2_clustering_header + path bams from ch_mlib_name_bam_mrep_counts.collect{ it[1] } + path saf from ch_mrep_macs_consensus_saf.collect() output: - file "*featureCounts.txt" into ch_mrep_macs_consensus_counts - file "*featureCounts.txt.summary" into ch_mrep_macs_consensus_counts_mqc - file "*.{RData,results.txt,pdf,log}" into ch_mrep_macs_consensus_deseq_results - file "sizeFactors" into ch_mrep_macs_consensus_deseq_factors - file "*vs*/*.{pdf,txt}" into ch_mrep_macs_consensus_deseq_comp_results - file "*vs*/*.bed" into ch_mrep_macs_consensus_deseq_comp_bed - file "*igv.txt" into ch_mrep_macs_consensus_deseq_comp_igv - file "*.tsv" into ch_mrep_macs_consensus_deseq_mqc + path '*featureCounts.txt' into ch_mrep_macs_consensus_counts + path '*featureCounts.txt.summary' into ch_mrep_macs_consensus_counts_mqc script: - prefix = "consensus_peaks.mRp.clN" + prefix = 'consensus_peaks.mRp.clN' bam_files = bams.findAll { it.toString().endsWith('.bam') }.sort() - bam_ext = params.single_end ? ".mLb.clN.sorted.bam" : ".mLb.clN.bam" - pe_params = params.single_end ? '' : "-p --donotsort" + pe_params = params.single_end ? '' : '-p --donotsort' """ featureCounts \\ -F SAF \\ @@ -1766,8 +1848,49 @@ process MergedRepConsensusPeakSetDESeq { -a $saf \\ -o ${prefix}.featureCounts.txt \\ ${bam_files.join(' ')} + """ +} - featurecounts_deseq2.r -i ${prefix}.featureCounts.txt -b '$bam_ext' -o ./ -p $prefix -s .mLb +/* + * STEP 8.8: Differential analysis with DESeq2 + */ +process MERGED_REP_CONSENSUS_DESEQ2 { + label 'process_medium' + publishDir "${params.outdir}/bwa/mergedReplicate/macs/${PEAK_TYPE}/consensus/deseq2", mode: params.publish_dir_mode, + saveAs: { filename -> + if (filename.endsWith('.igv.txt')) null + else filename + } + + when: + !params.skip_merge_replicates && replicatesExist && params.macs_gsize && multipleGroups && !params.skip_consensus_peaks && !params.skip_diff_analysis + + input: + path counts from ch_mrep_macs_consensus_counts + path mrep_deseq2_pca_header from ch_mrep_deseq2_pca_header + path mrep_deseq2_clustering_header from ch_mrep_deseq2_clustering_header + + output: + path '*.tsv' into ch_mrep_macs_consensus_deseq_mqc + path '*igv.txt' into ch_mrep_macs_consensus_deseq_comp_igv + path '*.{RData,results.txt,pdf,log}' + path 'sizeFactors' + path '*vs*/*.{pdf,txt}' + path '*vs*/*.bed' + + script: + prefix = 'consensus_peaks.mRp.clN' + bam_ext = params.single_end ? '.mLb.clN.sorted.bam' : '.mLb.clN.bam' + vst = params.deseq2_vst ? '--vst TRUE' : '' + """ + featurecounts_deseq2.r \\ + --featurecount_file $counts \\ + --bam_suffix '$bam_ext' \\ + --outdir ./ \\ + --outprefix $prefix \\ + --outsuffix .mRp.clN \\ + --cores $task.cpus \\ + $vst cat $mrep_deseq2_pca_header ${prefix}.pca.vals.txt > ${prefix}.pca.vals_mqc.tsv cat $mrep_deseq2_clustering_header ${prefix}.sample.dists.txt > ${prefix}.sample.dists_mqc.tsv @@ -1785,34 +1908,34 @@ process MergedRepConsensusPeakSetDESeq { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 9 - Create IGV session file + * STEP 9: Create IGV session file */ process IGV { - publishDir "${params.outdir}/igv/${PEAK_TYPE}", mode: 'copy' + publishDir "${params.outdir}/igv/${PEAK_TYPE}", mode: params.publish_dir_mode when: !params.skip_igv input: - file fasta from ch_fasta + path fasta from ch_fasta - file bigwigs from ch_mlib_bigwig_igv.collect().ifEmpty([]) - file peaks from ch_mlib_macs_igv.collect().ifEmpty([]) - file consensus_peaks from ch_mlib_macs_consensus_igv.collect().ifEmpty([]) - file differential_peaks from ch_mlib_macs_consensus_deseq_comp_igv.collect().ifEmpty([]) + path bigwigs from ch_mlib_bigwig_igv.collect().ifEmpty([]) + path peaks from ch_mlib_macs_igv.collect().ifEmpty([]) + path consensus_peaks from ch_mlib_macs_consensus_igv.collect().ifEmpty([]) + path differential_peaks from ch_mlib_macs_consensus_deseq_comp_igv.collect().ifEmpty([]) - file rbigwigs from ch_mrep_bigwig_igv.collect().ifEmpty([]) - file rpeaks from ch_mrep_macs_igv.collect().ifEmpty([]) - file rconsensus_peaks from ch_mrep_macs_consensus_igv.collect().ifEmpty([]) - file rdifferential_peaks from ch_mrep_macs_consensus_deseq_comp_igv.collect().ifEmpty([]) + path rbigwigs from ch_mrep_bigwig_igv.collect().ifEmpty([]) + path rpeaks from ch_mrep_macs_igv.collect().ifEmpty([]) + path rconsensus_peaks from ch_mrep_macs_consensus_igv.collect().ifEmpty([]) + path rdifferential_peaks from ch_mrep_macs_consensus_deseq_comp_igv.collect().ifEmpty([]) output: - file "*.{txt,xml}" into ch_igv_session + path '*.{txt,xml}' script: // scripts are bundled with the pipeline, in nf-core/atacseq/bin/ """ cat *.txt > igv_files.txt - igv_files_to_session.py igv_session.xml igv_files.txt ../../reference_genome/${fasta.getName()} --path_prefix '../../' + igv_files_to_session.py igv_session.xml igv_files.txt ../../genome/${fasta.getName()} --path_prefix '../../' """ } @@ -1828,15 +1951,15 @@ process IGV { * Parse software version numbers */ process get_software_versions { - publishDir "${params.outdir}/pipeline_info", mode: 'copy', + publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.indexOf(".csv") > 0) filename + if (filename.indexOf('.csv') > 0) filename else null } output: - file 'software_versions_mqc.yaml' into ch_software_versions_mqc - file "software_versions.csv" + path 'software_versions_mqc.yaml' into ch_software_versions_mqc + path 'software_versions.csv' script: """ @@ -1862,10 +1985,10 @@ process get_software_versions { """ } -def create_workflow_summary(summary) { - - def yaml_file = workDir.resolve('workflow_summary_mqc.yaml') - yaml_file.text = """ +Channel.from(summary.collect{ [it.key, it.value] }) + .map { k,v -> "
$k
${v ?: 'N/A'}
" } + .reduce { a, b -> return [a, b].join("\n ") } + .map { x -> """ id: 'nf-core-atacseq-summary' description: " - this information is collected when the pipeline is started." section_name: 'nf-core/atacseq Workflow Summary' @@ -1873,66 +1996,64 @@ def create_workflow_summary(summary) { plot_type: 'html' data: |
-${summary.collect { k,v -> "
$k
${v ?: 'N/A'}
" }.join("\n")} + $x
- """.stripIndent() - - return yaml_file -} + """.stripIndent() } + .set { ch_workflow_summary } /* - * STEP 10 - MultiQC + * STEP 10: MultiQC */ -process MultiQC { - publishDir "${params.outdir}/multiqc/${PEAK_TYPE}", mode: 'copy' +process MULTIQC { + publishDir "${params.outdir}/multiqc/${PEAK_TYPE}", mode: params.publish_dir_mode when: !params.skip_multiqc input: - file multiqc_config from ch_multiqc_config - - file ('software_versions/*') from ch_software_versions_mqc.collect() - file ('workflow_summary/*') from create_workflow_summary(summary) - - file ('fastqc/*') from ch_fastqc_reports_mqc.collect().ifEmpty([]) - file ('trimgalore/*') from ch_trimgalore_results_mqc.collect().ifEmpty([]) - file ('trimgalore/fastqc/*') from ch_trimgalore_fastqc_reports_mqc.collect().ifEmpty([]) - - file ('alignment/library/*') from ch_sort_bam_flagstat_mqc.collect() - - file ('alignment/mergedLibrary/*') from ch_mlib_bam_stats_mqc.collect() - file ('alignment/mergedLibrary/*') from ch_mlib_rm_orphan_flagstat_mqc.collect{it[1]} - file ('alignment/mergedLibrary/*') from ch_mlib_rm_orphan_stats_mqc.collect() - file ('alignment/mergedLibrary/picard_metrics/*') from ch_mlib_bam_metrics_mqc.collect() - file ('alignment/mergedLibrary/picard_metrics/*') from ch_mlib_collectmetrics_mqc.collect() - file ('macs/mergedLibrary/*') from ch_mlib_macs_mqc.collect().ifEmpty([]) - file ('macs/mergedLibrary/*') from ch_mlib_peak_qc_mqc.collect().ifEmpty([]) - file ('macs/mergedLibrary/consensus/*') from ch_mlib_macs_consensus_counts_mqc.collect().ifEmpty([]) - file ('macs/mergedLibrary/consensus/*') from ch_mlib_macs_consensus_deseq_mqc.collect().ifEmpty([]) - file ('preseq/*') from ch_mlib_preseq_mqc.collect().ifEmpty([]) - file ('deeptools/*') from ch_mlib_plotprofile_mqc.collect().ifEmpty([]) - file ('deeptools/*') from ch_mlib_plotfingerprint_mqc.collect().ifEmpty([]) - - file ('alignment/mergedReplicate/*') from ch_mrep_bam_flagstat_mqc.collect{it[1]}.ifEmpty([]) - file ('alignment/mergedReplicate/*') from ch_mrep_bam_stats_mqc.collect().ifEmpty([]) - file ('alignment/mergedReplicate/*') from ch_mrep_bam_metrics_mqc.collect().ifEmpty([]) - file ('macs/mergedReplicate/*') from ch_mrep_macs_mqc.collect().ifEmpty([]) - file ('macs/mergedReplicate/*') from ch_mrep_peak_qc_mqc.collect().ifEmpty([]) - file ('macs/mergedReplicate/consensus/*') from ch_mrep_macs_consensus_counts_mqc.collect().ifEmpty([]) - file ('macs/mergedReplicate/consensus/*') from ch_mrep_macs_consensus_deseq_mqc.collect().ifEmpty([]) + path (multiqc_config) from ch_multiqc_config + path (mqc_custom_config) from ch_multiqc_custom_config.collect().ifEmpty([]) + + path ('software_versions/*') from ch_software_versions_mqc.collect() + path workflow_summary from ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml') + + path ('fastqc/*') from ch_fastqc_reports_mqc.collect().ifEmpty([]) + path ('trimgalore/*') from ch_trimgalore_results_mqc.collect().ifEmpty([]) + path ('trimgalore/fastqc/*') from ch_trimgalore_fastqc_reports_mqc.collect().ifEmpty([]) + + path ('alignment/library/*') from ch_sort_bam_flagstat_mqc.collect() + + path ('alignment/mergedLibrary/*') from ch_mlib_bam_stats_mqc.collect() + path ('alignment/mergedLibrary/*') from ch_mlib_rm_orphan_flagstat_mqc.collect{it[1]} + path ('alignment/mergedLibrary/*') from ch_mlib_rm_orphan_stats_mqc.collect() + path ('alignment/mergedLibrary/picard_metrics/*') from ch_mlib_bam_metrics_mqc.collect() + path ('alignment/mergedLibrary/picard_metrics/*') from ch_mlib_collectmetrics_mqc.collect() + path ('macs/mergedLibrary/*') from ch_mlib_macs_mqc.collect().ifEmpty([]) + path ('macs/mergedLibrary/*') from ch_mlib_peak_qc_mqc.collect().ifEmpty([]) + path ('macs/mergedLibrary/consensus/*') from ch_mlib_macs_consensus_counts_mqc.collect().ifEmpty([]) + path ('macs/mergedLibrary/consensus/*') from ch_mlib_macs_consensus_deseq_mqc.collect().ifEmpty([]) + path ('preseq/*') from ch_mlib_preseq_mqc.collect().ifEmpty([]) + path ('deeptools/*') from ch_mlib_plotprofile_mqc.collect().ifEmpty([]) + path ('deeptools/*') from ch_mlib_plotfingerprint_mqc.collect().ifEmpty([]) + + path ('alignment/mergedReplicate/*') from ch_mrep_bam_flagstat_mqc.collect{it[1]}.ifEmpty([]) + path ('alignment/mergedReplicate/*') from ch_mrep_bam_stats_mqc.collect().ifEmpty([]) + path ('alignment/mergedReplicate/*') from ch_mrep_bam_metrics_mqc.collect().ifEmpty([]) + path ('macs/mergedReplicate/*') from ch_mrep_macs_mqc.collect().ifEmpty([]) + path ('macs/mergedReplicate/*') from ch_mrep_peak_qc_mqc.collect().ifEmpty([]) + path ('macs/mergedReplicate/consensus/*') from ch_mrep_macs_consensus_counts_mqc.collect().ifEmpty([]) + path ('macs/mergedReplicate/consensus/*') from ch_mrep_macs_consensus_deseq_mqc.collect().ifEmpty([]) output: - file "*multiqc_report.html" into ch_multiqc_report - file "*_data" - file "multiqc_plots" + path '*multiqc_report.html' into ch_multiqc_report + path '*_data' script: rtitle = custom_runName ? "--title \"$custom_runName\"" : '' rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : '' + custom_config_file = params.multiqc_config ? "--config $mqc_custom_config" : '' """ - multiqc . -f $rtitle $rfilename --config $multiqc_config \\ - -m custom_content -m fastqc -m cutadapt -m samtools -m picard -m preseq -m featureCounts -m deeptools + multiqc . -f $rtitle $rfilename $custom_config_file """ } @@ -1945,20 +2066,21 @@ process MultiQC { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 11 - Output description HTML + * STEP 11: Output description HTML */ process output_documentation { - publishDir "${params.outdir}/Documentation", mode: 'copy' + publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode input: - file output_docs from ch_output_docs + path output_docs from ch_output_docs + path images from ch_output_docs_images output: - file "results_description.html" + path 'results_description.html' script: """ - markdown_to_html.r $output_docs results_description.html + markdown_to_html.py $output_docs -o results_description.html """ } @@ -1991,7 +2113,6 @@ workflow.onComplete { if (workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository if (workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId if (workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision - if (workflow.container) email_fields['summary']['Docker image'] = workflow.container email_fields['summary']['Nextflow Version'] = workflow.nextflow.version email_fields['summary']['Nextflow Build'] = workflow.nextflow.build email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp @@ -2057,22 +2178,21 @@ workflow.onComplete { def output_tf = new File(output_d, "pipeline_report.txt") output_tf.withWriter { w -> w << email_txt } - c_reset = params.monochrome_logs ? '' : "\033[0m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; c_green = params.monochrome_logs ? '' : "\033[0;32m"; + c_purple = params.monochrome_logs ? '' : "\033[0;35m"; c_red = params.monochrome_logs ? '' : "\033[0;31m"; + c_reset = params.monochrome_logs ? '' : "\033[0m"; if (workflow.stats.ignoredCount > 0 && workflow.success) { - log.info "${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}" - log.info "${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}" - log.info "${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}" + log.info "-${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}-" + log.info "-${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}-" + log.info "-${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}-" } - if (workflow.success) { - log.info "${c_purple}[nf-core/atacseq]${c_green} Pipeline completed successfully${c_reset}" + log.info "-${c_purple}[nf-core/atacseq]${c_green} Pipeline completed successfully${c_reset}-" } else { checkHostname() - log.info "${c_purple}[nf-core/atacseq]${c_red} Pipeline completed with errors${c_reset}" + log.info "-${c_purple}[nf-core/atacseq]${c_red} Pipeline completed with errors${c_reset}-" } } @@ -2087,15 +2207,15 @@ workflow.onComplete { def nfcoreHeader() { // Log colors ANSI codes - c_reset = params.monochrome_logs ? '' : "\033[0m"; - c_dim = params.monochrome_logs ? '' : "\033[2m"; c_black = params.monochrome_logs ? '' : "\033[0;30m"; - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; c_blue = params.monochrome_logs ? '' : "\033[0;34m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; c_cyan = params.monochrome_logs ? '' : "\033[0;36m"; + c_dim = params.monochrome_logs ? '' : "\033[2m"; + c_green = params.monochrome_logs ? '' : "\033[0;32m"; + c_purple = params.monochrome_logs ? '' : "\033[0;35m"; + c_reset = params.monochrome_logs ? '' : "\033[0m"; c_white = params.monochrome_logs ? '' : "\033[0;37m"; + c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; return """ -${c_dim}--------------------------------------------------${c_reset}- ${c_green},--.${c_black}/${c_green},-.${c_reset} diff --git a/nextflow.config b/nextflow.config old mode 100755 new mode 100644 index 0255adf4..5c08ccbf --- a/nextflow.config +++ b/nextflow.config @@ -9,6 +9,7 @@ params { // Options: Generic + input = './design.csv' single_end = false seq_center = false fragment_size = 0 @@ -29,6 +30,7 @@ params { save_trimmed = false // Options: Alignments + bwa_min_score = false keep_mito = false keep_dups = false keep_multi_map = false @@ -38,8 +40,16 @@ params { // Options: Peaks narrow_peak = false broad_cutoff = 0.1 + macs_fdr = false + macs_pvalue = false min_reps_consensus = 1 save_macs_pileup = false + skip_peak_qc = false + skip_peak_annotation = false + skip_consensus_peaks = false + + // Options: Differential analysis + deseq2_vst = false skip_diff_analysis = false // Options: QC @@ -52,12 +62,8 @@ params { skip_igv = false skip_multiqc = false - // Options: AWSBatch - awsqueue = false - awsregion = 'eu-west-1' - // Options: Config - multiqc_config = "$baseDir/assets/multiqc/multiqc_config.yaml" + multiqc_config = false bamtools_filter_pe_config = "$baseDir/assets/bamtools_filter_pe.json" bamtools_filter_se_config = "$baseDir/assets/bamtools_filter_se.json" @@ -71,6 +77,7 @@ params { // Options: Other help = false outdir = './results' + publish_dir_mode = 'copy' igenomes_base = 's3://ngi-igenomes/igenomes/' igenomes_ignore = false max_multiqc_email_size = 25.MB @@ -92,7 +99,7 @@ params { // Container slug. Stable releases should specify release tag! // Developmental code should specify :dev -process.container = 'nfcore/atacseq:1.1.0' +process.container = 'nfcore/atacseq:1.2.0' // Load base.config by default for all pipelines includeConfig 'conf/base.config' @@ -105,20 +112,24 @@ try { } profiles { - awsbatch { includeConfig 'conf/awsbatch.config' } conda { process.conda = "$baseDir/environment.yml" } debug { process.beforeScript = 'echo $HOSTNAME' } - docker { docker.enabled = true } - singularity { singularity.enabled = true - singularity.autoMounts = true } + docker { + docker.enabled = true + // Avoid this error: + // WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. + // Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351 + // once this is established and works well, nextflow might implement this behavior as new default. + docker.runOptions = '-u \$(id -u):\$(id -g)' + } + singularity { + singularity.enabled = true + singularity.autoMounts = true + } test { includeConfig 'conf/test.config' } + test_full { includeConfig 'conf/test_full.config' } } -// Avoid this error: -// WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. -// Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351, once this is established and works well, nextflow might implement this behavior as new default. -docker.runOptions = '-u \$(id -u):\$(id -g)' - // Load igenomes.config if required if (!params.igenomes_ignore) { includeConfig 'conf/igenomes.config' @@ -127,9 +138,11 @@ if (!params.igenomes_ignore) { // Increase time available to build conda environment conda { createTimeout = "60 min" } -// Export this variable to prevent local Python libraries from conflicting with those in the container +// Export these variables to prevent local Python/R libraries from conflicting with those in the container env { PYTHONNOUSERSITE = 1 + R_PROFILE_USER = "/.Rprofile" + R_ENVIRON_USER = "/.Renviron" } // Capture exit codes from upstream processes when piping @@ -159,7 +172,7 @@ manifest { description = 'ATACSeq peak-calling and differential analysis pipeline.' mainScript = 'main.nf' nextflowVersion = '>=19.10.0' - version = '1.1.0' + version = '1.2.0' } // Function to ensure that resource requirements don't go beyond diff --git a/nextflow_schema.json b/nextflow_schema.json new file mode 100644 index 00000000..149930a5 --- /dev/null +++ b/nextflow_schema.json @@ -0,0 +1,565 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/atacseq/master/nextflow_schema.json", + "title": "nf-core/atacseq pipeline parameters", + "description": "ATACSeq peak-calling and differential analysis pipeline.", + "type": "object", + "properties": { + "Input/output options": { + "type": "object", + "properties": { + "input": { + "type": "string", + "description": "Path to comma-separated file containing information about the samples in the experiment.", + "help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 4 columns, and a header row. See [usage docs](https://nf-co.re/atacseq/docs/usage#--input).", + "fa_icon": "fas fa-file-csv" + }, + "single_end": { + "type": "boolean", + "description": "Specifies that the input is single-end reads.", + "fa_icon": "fas fa-align-center", + "default": false, + "help_text": "By default, the pipeline expects paired-end data. If you have single-end data, specify this parameter on the command line when you launch the pipeline. It is not possible to run a mixture of single-end and paired-end files in one run." + }, + "fragment_size": { + "type": "integer", + "default": 0, + "description": "Estimated fragment size used to extend single-end reads.", + "fa_icon": "fas fa-chart-area", + "help_text": "" + }, + "seq_center": { + "type": "string", + "description": "Sequencing center information to be added to read group of BAM files.", + "fa_icon": "fas fa-synagogue" + }, + "outdir": { + "type": "string", + "description": "Path to the output directory where the results will be saved.", + "default": "./results", + "fa_icon": "fas fa-folder-open", + "help_text": "" + }, + "email": { + "type": "string", + "description": "Email address for completion summary.", + "fa_icon": "fas fa-envelope", + "help_text": "An email address to send a summary email to when the pipeline is completed.", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$" + } + }, + "required": [ + "input" + ], + "fa_icon": "fas fa-terminal" + }, + "Reference genome options": { + "type": "object", + "properties": { + "genome": { + "type": "string", + "description": "Name of iGenomes reference.", + "fa_icon": "fas fa-book", + "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`." + }, + "fasta": { + "type": "string", + "description": "Path to Fasta reference file.", + "help_text": "This parameter is *mandatory* if `--genome` is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with `--save_reference` to save BWA index for future runs.", + "fa_icon": "far fa-file-code" + }, + "gtf": { + "type": "string", + "description": "Path to GTF annotation file.", + "fa_icon": "fas fa-file-invoice", + "help_text": "This parameter is *mandatory* if `--genome` is not specified." + }, + "bwa_index": { + "type": "string", + "description": "Full path to directory containing BWA index including base name. i.e. `/path/to/index/genome.fa`.", + "fa_icon": "fas fa-bezier-curve", + "help_text": "" + }, + "gene_bed": { + "type": "string", + "description": "Path to BED file containing gene intervals. This will be created from the GTF file if not specified.", + "fa_icon": "fas fa-procedures", + "help_text": "" + }, + "tss_bed": { + "type": "string", + "description": "Path to BED file containing transcription start sites. This will be created from the gene BED file if not specified.", + "fa_icon": "fas fa-procedures", + "help_text": "" + }, + "macs_gsize": { + "type": "string", + "description": "Effective genome size parameter required by MACS2.", + "help_text": "[Effective genome size](https://github.com/taoliu/MACS#-g--gsize) parameter required by MACS2. If using an iGenomes reference these have been provided when `--genome` is set as *GRCh37*, *GRCh38*, *GRCm38*, *WBcel235*, *BDGP6*, *R64-1-1*, *EF2*, *hg38*, *hg19* and *mm10*. For other genomes, if this parameter is not specified then the MACS2 peak-calling and differential analysis will be skipped.", + "fa_icon": "fas fa-arrows-alt-h" + }, + "blacklist": { + "type": "string", + "description": "Path to blacklist regions in BED format, used for filtering alignments.", + "help_text": "If provided, alignments that overlap with the regions in this file will be filtered out (see [ENCODE blacklists](https://sites.google.com/site/anshulkundaje/projects/blacklists)). The file should be in BED format. Blacklisted regions for *GRCh37*, *GRCh38*, *GRCm38*, *hg19*, *hg38*, *mm10* are bundled with the pipeline in the [`blacklists`](../assets/blacklists/) directory, and as such will be automatically used if any of those genomes are specified with the `--genome` parameter.", + "fa_icon": "fas fa-book-dead" + }, + "mito_name": { + "type": "string", + "description": "Name of Mitochondrial chomosome in reference assembly e.g. chrM.", + "help_text": "Reads aligning to this contig are filtered out if a valid identifier is provided otherwise this step is skipped. Where possible these have been provided in the [`igenomes.config`](../conf/igenomes.config).", + "fa_icon": "fas fa-signature" + }, + "save_reference": { + "type": "boolean", + "default": false, + "description": "If generated by the pipeline save the BWA index in the results directory.", + "help_text": "If the BWA index is generated by the pipeline use this parameter to save it to your results folder. These can then be used for future pipeline runs, reducing processing times.", + "fa_icon": "fas fa-save" + }, + "igenomes_base": { + "type": "string", + "description": "Directory / URL base for iGenomes references.", + "default": "s3://ngi-igenomes/igenomes/", + "fa_icon": "fas fa-cloud-download-alt", + "hidden": true, + "help_text": "" + }, + "igenomes_ignore": { + "type": "boolean", + "description": "Do not load the iGenomes reference config.", + "fa_icon": "fas fa-ban", + "hidden": true, + "default": false, + "help_text": "Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`." + } + }, + "fa_icon": "fas fa-dna" + }, + "Adapter trimming options": { + "type": "object", + "properties": { + "clip_r1": { + "type": "integer", + "default": 0, + "description": "Instructs Trim Galore to remove bp from the 5' end of read 1 (or single-end reads).", + "fa_icon": "fas fa-cut", + "help_text": "" + }, + "clip_r2": { + "type": "integer", + "default": 0, + "description": "Instructs Trim Galore to remove bp from the 5' end of read 2 (paired-end reads only).", + "fa_icon": "fas fa-cut", + "help_text": "" + }, + "three_prime_clip_r1": { + "type": "integer", + "default": 0, + "description": "Instructs Trim Galore to remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed.", + "fa_icon": "fas fa-cut" + }, + "three_prime_clip_r2": { + "type": "integer", + "default": 0, + "description": "Instructs Trim Galore to remove bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed.", + "fa_icon": "fas fa-cut" + }, + "trim_nextseq": { + "type": "integer", + "default": 0, + "description": "Instructs Trim Galore to apply the --nextseq=X option, to trim based on quality after removing poly-G tails.", + "help_text": "This enables the option Cutadapt `--nextseq-trim=3'CUTOFF` option via Trim Galore, which will set a quality cutoff (that is normally given with -q instead), but qualities of G bases are ignored. This trimming is in common for the NextSeq- and NovaSeq-platforms, where basecalls without any signal are called as high-quality G bases.", + "fa_icon": "fas fa-cut" + }, + "skip_trimming": { + "type": "boolean", + "default": false, + "description": "Skip the adapter trimming step.", + "help_text": "Use this if your input FastQ files have already been trimmed outside of the workflow or if you're very confident that there is no adapter contamination in your data.", + "fa_icon": "fas fa-forward" + }, + "save_trimmed": { + "type": "boolean", + "default": false, + "description": "Save the trimmed FastQ files in the results directory.", + "help_text": "By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete.", + "fa_icon": "fas fa-save" + } + }, + "fa_icon": "fas fa-cut" + }, + "Alignment options": { + "type": "object", + "properties": { + "keep_mito": { + "type": "boolean", + "default": false, + "description": "Reads mapping to mitochondrial contig are not filtered from alignments.", + "fa_icon": "fas fa-cart-arrow-down" + }, + "keep_dups": { + "type": "boolean", + "default": false, + "description": "Duplicate reads are not filtered from alignments.", + "fa_icon": "fas fa-cart-arrow-down" + }, + "keep_multi_map": { + "type": "boolean", + "default": false, + "description": "Reads mapping to multiple locations are not filtered from alignments.", + "fa_icon": "fas fa-cart-arrow-down" + }, + "bwa_min_score": { + "type": "integer", + "description": "Don\u2019t output BWA MEM alignments with score lower than this parameter.", + "fa_icon": "fas fa-hand-paper" + }, + "skip_merge_replicates": { + "type": "boolean", + "default": false, + "description": "Do not perform alignment merging and downstream analysis by merging replicates i.e. only do this by merging resequenced libraries.", + "help_text": "An additional series of steps are performed by the pipeline for merging the replicates from the same experimental group. This is primarily to increase the sequencing depth in order to perform downstream analyses such as footprinting. Specifying this parameter means that these steps will not be performed.", + "fa_icon": "fas fa-forward" + }, + "save_align_intermeds": { + "type": "boolean", + "default": false, + "description": "Save the intermediate BAM files from the alignment step.", + "help_text": "By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set this parameter to also save other intermediate BAM files.", + "fa_icon": "fas fa-save" + }, + "bamtools_filter_pe_config": { + "type": "string", + "default": "$baseDir/assets/bamtools_filter_pe.json", + "hidden": true, + "description": "BAMTools JSON file with custom filters for paired-end data.", + "fa_icon": "fas fa-cog", + "help_text": "" + }, + "bamtools_filter_se_config": { + "type": "string", + "default": "$baseDir/assets/bamtools_filter_se.json", + "hidden": true, + "description": "BAMTools JSON file with custom filters for single-end data.", + "fa_icon": "fas fa-cog", + "help_text": "" + } + }, + "fa_icon": "fas fa-map-signs" + }, + "Peak calling options": { + "type": "object", + "properties": { + "narrow_peak": { + "type": "boolean", + "default": false, + "description": "Run MACS2 in narrowPeak mode.", + "help_text": "MACS2 is run by default with the [`--broad`](https://github.com/taoliu/MACS#--broad) flag. Specify this flag to call peaks in narrowPeak mode.", + "fa_icon": "fas fa-arrows-alt-h" + }, + "broad_cutoff": { + "type": "number", + "default": 0.1, + "description": "Specifies broad cutoff value for MACS2. Only used when --narrow_peak isnt specified.", + "fa_icon": "fas fa-hand-scissors" + }, + "macs_fdr": { + "type": "number", + "description": "Minimum FDR (q-value) cutoff for peak detection, --macs_fdr and --macs_pvalue are mutually exclusive.", + "fa_icon": "fas fa-sort-amount-down" + }, + "macs_pvalue": { + "type": "number", + "description": "p-value cutoff for peak detection, --macs_fdr and --macs_pvalue are mutually exclusive. If --macs_pvalue cutoff is set, q-value will not be calculated and reported as -1 in the final .xls file.", + "fa_icon": "fas fa-sort-amount-down" + }, + "min_reps_consensus": { + "type": "integer", + "default": 1, + "description": "Number of biological replicates required from a given condition for a peak to contribute to a consensus peak.", + "help_text": "If you are confident you have good reproducibility amongst your replicates then you can increase the value of this parameter to create a \"reproducible\" set of consensus peaks. For example, a value of 2 will mean peaks that have been called in at least 2 replicates will contribute to the consensus set of peaks, and as such peaks that are unique to a given replicate will be discarded.", + "fa_icon": "fas fa-sort-numeric-down" + }, + "save_macs_pileup": { + "type": "boolean", + "default": false, + "description": "Instruct MACS2 to create bedGraph files normalised to signal per million reads.", + "fa_icon": "fas fa-save", + "help_text": "" + }, + "skip_peak_qc": { + "type": "boolean", + "fa_icon": "fas fa-forward", + "description": "Skip MACS2 peak QC plot generation.", + "default": false + }, + "skip_peak_annotation": { + "type": "boolean", + "fa_icon": "fas fa-forward", + "description": "Skip annotation of MACS2 and consensus peaks with HOMER.", + "default": false + }, + "skip_consensus_peaks": { + "type": "boolean", + "default": false, + "description": "Skip consensus peak generation, annotation and counting.", + "fa_icon": "fas fa-forward", + "help_text": "" + } + }, + "fa_icon": "fas fa-chart-area" + }, + "Differential analysis options": { + "type": "object", + "properties": { + "deseq2_vst": { + "type": "boolean", + "default": false, + "description": "Use vst transformation instead of rlog with DESeq2.", + "help_text": "See [DESeq2 docs](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization).", + "fa_icon": "fas fa-dolly" + }, + "skip_diff_analysis": { + "type": "boolean", + "default": false, + "description": "Skip differential accessibility analysis.", + "fa_icon": "fas fa-forward", + "help_text": "" + } + }, + "fa_icon": "fas fa-not-equal" + }, + "Process skipping options": { + "type": "object", + "properties": { + "skip_fastqc": { + "type": "boolean", + "default": false, + "description": "Skip FastQC.", + "fa_icon": "fas fa-forward" + }, + "skip_picard_metrics": { + "type": "boolean", + "default": false, + "description": "Skip Picard CollectMultipleMetrics.", + "fa_icon": "fas fa-forward" + }, + "skip_preseq": { + "type": "boolean", + "default": false, + "description": "Skip Preseq.", + "fa_icon": "fas fa-forward" + }, + "skip_plot_profile": { + "type": "boolean", + "default": false, + "description": "Skip deepTools plotProfile.", + "fa_icon": "fas fa-forward" + }, + "skip_plot_fingerprint": { + "type": "boolean", + "default": false, + "description": "Skip deepTools plotFingerprint.", + "fa_icon": "fas fa-forward" + }, + "skip_ataqv": { + "type": "boolean", + "default": false, + "description": "Skip Ataqv.", + "fa_icon": "fas fa-forward" + }, + "skip_igv": { + "type": "boolean", + "default": false, + "description": "Skip IGV.", + "fa_icon": "fas fa-forward" + }, + "skip_multiqc": { + "type": "boolean", + "default": false, + "description": "Skip MultiQC.", + "fa_icon": "fas fa-forward" + } + }, + "fa_icon": "fas fa-forward" + }, + "Institutional config options": { + "type": "object", + "properties": { + "custom_config_version": { + "type": "string", + "description": "Git commit id for Institutional configs.", + "default": "master", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + }, + "custom_config_base": { + "type": "string", + "description": "Base directory for Institutional configs.", + "default": "https://raw.githubusercontent.com/nf-core/configs/master", + "hidden": true, + "help_text": "If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.", + "fa_icon": "fas fa-users-cog" + }, + "hostnames": { + "type": "string", + "description": "Institutional configs hostname.", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + }, + "config_profile_description": { + "type": "string", + "description": "Institutional config description.", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + }, + "config_profile_contact": { + "type": "string", + "description": "Institutional config contact information.", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + }, + "config_profile_url": { + "type": "string", + "description": "Institutional config URL link.", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + } + }, + "fa_icon": "fas fa-university" + }, + "Max job request options": { + "type": "object", + "properties": { + "max_cpus": { + "type": "integer", + "description": "Maximum number of CPUs that can be requested for any single job.", + "default": 16, + "fa_icon": "fas fa-microchip", + "hidden": true, + "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" + }, + "max_memory": { + "type": "string", + "description": "Maximum amount of memory that can be requested for any single job.", + "default": "128.GB", + "fa_icon": "fas fa-memory", + "hidden": true, + "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" + }, + "max_time": { + "type": "string", + "description": "Maximum amount of time that can be requested for any single job.", + "default": "240.h", + "fa_icon": "far fa-clock", + "hidden": true, + "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" + } + }, + "fa_icon": "fab fa-acquisitions-incorporated" + }, + "Generic options": { + "type": "object", + "properties": { + "help": { + "type": "boolean", + "description": "Display help text.", + "fa_icon": "fas fa-question-circle", + "default": false, + "hidden": true + }, + "fingerprint_bins": { + "type": "integer", + "default": 500000, + "description": "Number of genomic bins to use when calculating deepTools fingerprint plot.", + "fa_icon": "fas fa-dumpster", + "hidden": true + }, + "publish_dir_mode": { + "type": "string", + "default": "copy", + "description": "Method used to save pipeline results to output directory.", + "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", + "fa_icon": "fas fa-copy", + "enum": [ + "symlink", + "rellink", + "link", + "copy", + "copyNoFollow", + "mov" + ], + "hidden": true + }, + "name": { + "type": "string", + "description": "Workflow name.", + "fa_icon": "fas fa-address-card", + "help_text": "A custom name for the pipeline run. Unlike the core nextflow `-name` option with one hyphen this parameter can be reused multiple times, for example if using `-resume`. Passed through to steps such as MultiQC and used for things like report filenames and titles.", + "hidden": true + }, + "email_on_fail": { + "type": "string", + "description": "Email address for completion summary, only when pipeline fails.", + "fa_icon": "fas fa-exclamation-triangle", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$", + "help_text": "An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.", + "hidden": true + }, + "plaintext_email": { + "type": "boolean", + "description": "Send plain-text email instead of HTML.", + "fa_icon": "fas fa-remove-format", + "default": false, + "help_text": "", + "hidden": true + }, + "max_multiqc_email_size": { + "type": "string", + "description": "File size limit when attaching MultiQC reports to summary emails.", + "default": "25.MB", + "fa_icon": "fas fa-file-upload", + "help_text": "", + "hidden": true + }, + "monochrome_logs": { + "type": "boolean", + "description": "Do not use coloured log outputs.", + "fa_icon": "fas fa-palette", + "default": false, + "help_text": "", + "hidden": true + }, + "multiqc_config": { + "type": "string", + "description": "Custom config file to supply to MultiQC.", + "fa_icon": "fas fa-cog", + "help_text": "", + "hidden": true + }, + "tracedir": { + "type": "string", + "description": "Directory to keep pipeline Nextflow logs and reports.", + "default": "${params.outdir}/pipeline_info", + "fa_icon": "fas fa-cogs", + "help_text": "", + "hidden": true + }, + "clusterOptions": { + "type": "string", + "description": "Arguments passed to Nextflow clusterOptions.", + "fa_icon": "fas fa-network-wired", + "help_text": "", + "hidden": true + } + }, + "fa_icon": "fas fa-file-import" + } + } +} \ No newline at end of file