Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First release review #161

Merged
merged 17 commits into from
Oct 1, 2024
4 changes: 4 additions & 0 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ If you'd like to write some code for nf-core/multiplesequencealign, the standard

If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/).

:::note
! There is an extended documentation for adding specific module types into this pipeline at [extending](../docs/extending.md).
:::

## Tests

You have the option to test your changes locally by running the pipeline. For receiving warnings about process selectors and other `debug` information, it is recommended to use the debug profile. Execute all the tests with the following command:
Expand Down
5 changes: 4 additions & 1 deletion .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,7 @@ repository_type: pipeline
nf_core_version: "2.14.1"
lint:
multiqc_config: False
files_exist: conf/igenomes.config
files_exist:
- conf/igenomes.config
files_unchanged:
- .github/CONTRIBUTING.md
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 1.0.0 - Somorrostro
## [1.0.0 - Somorrostro](https://github.com/nf-core/multiplesequencealign/releases/tag/1.0.0)

Somorrostro is a beach in Barcelona.

Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@

> Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002 Jul 15;30(14):3059-66. doi: 10.1093/nar/gkf436. PMID: 12136088; PMCID: PMC135756.

- [MAGUS](https://pubmed.ncbi.nlm.nih.gov/33252662/)

> Smirnov V, Warnow T. MAGUS: Multiple sequence Alignment using Graph clUStering. Bioinformatics. 2021 Jul 19;37(12):1666-1672. doi: 10.1093/bioinformatics/btaa992. PMID: 33252662; PMCID: PMC8289385.

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
Expand Down
28 changes: 18 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,13 @@

**nf-core/multiplesequencealign** is a pipeline to deploy and systematically evaluate Multiple Sequence Alignment (MSA) methods.

The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources.The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/proteinfold/results).

![Alt text](docs/images/nf-core-msa_metro_map.png?raw=true "nf-core-msa metro map")

In a nutshell, the pipeline performs the following steps:
The pipeline performs the following steps:

1. **Input files summary**: (Optional) computation of summary statistics on the input files, such as the average sequence similarity across the input sequences, their length, plddt extraction if available.

Expand All @@ -34,8 +38,9 @@ In a nutshell, the pipeline performs the following steps:

## Usage

> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
:::note
If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
:::

#### 1. SAMPLESHEET

Expand All @@ -52,8 +57,9 @@ toxin,toxin.fa,toxin-ref.fa,toxin_structures,toxin_template.txt

Each row represents a set of sequences (in this case the seatoxin and toxin protein families) to be aligned and the associated (if available) reference alignments and dependency files (this can be anything from protein structure or any other information you would want to use in your favourite MSA tool).

> [!NOTE]
> The only required input is the id column and either fasta or dependencies.
:::note
The only required input is the id column and either fasta or dependencies.
:::

#### 2. TOOLSHEET

Expand All @@ -72,8 +78,9 @@ FAMSA, -gt upgma -medoidtree, FAMSA,
FAMSA,,REGRESSIVE,
```

> [!NOTE]
> The only required input is aligner.
:::note
The only required input is aligner.
:::

#### 3. RUN THE PIPELINE

Expand All @@ -87,9 +94,10 @@ nextflow run nf-core/multiplesequencealign \
--outdir outdir
```

> [!WARNING]
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
:::warning
Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
:::

For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/multiplesequencealign/usage) and the [parameter documentation](https://nf-co.re/multiplesequencealign/parameters).

Expand Down
2 changes: 1 addition & 1 deletion assets/adaptivecard.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
"size": "Large",
"weight": "Bolder",
"color": "<% if (success) { %>Good<% } else { %>Attention<%} %>",
"text": "nf-core/multiplesequencealign v${version} - ${runName}",
"text": "nf-core/multiplesequencealign ${version} - ${runName}",
"wrap": true
},
{
Expand Down
2 changes: 1 addition & 1 deletion assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
report_comment: >
This report has been generated by the <a href="https://github.com/nf-core/multiplesequencealign/releases/tag/1.0.0" target="_blank">nf-core/multiplesequencealign</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://nf-co.re/multiplesequencealign/0.1.0dev/docs/output" target="_blank">documentation</a>.
<a href="https://nf-co.re/multiplesequencealign/1.0.0/docs/output" target="_blank">documentation</a>.
report_section_order:
"nf-core-multiplesequencealign-methods-description":
order: -1000
Expand Down
2 changes: 2 additions & 0 deletions assets/schema_tools.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
"properties": {
"tree": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "tree name cannot contain spaces",
"meta": ["tree"]
},
Expand All @@ -19,6 +20,7 @@
"aligner": {
"type": "string",
"meta": ["aligner"],
"pattern": "^\\S+$",
"errorMessage": "align name must be provided and cannot contain spaces"
},
"args_aligner": {
Expand Down
2 changes: 2 additions & 0 deletions docs/extending.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ This pipeline is extensible, allowing the incorporation of new methods for assem
- The [nf-test documentation](https://code.askimed.com/nf-test/docs/getting-started/)
- The [nf-core slack](https://nf-co.re/join), particularly the [multiplesequencealign channel](https://nfcore.slack.com/archives/C05LZ7EAYGK). Feel free to reach out!

Please also check the [contribution guidelines](../.github/CONTRIBUTING.md).

## Adding an aligner

These steps will guide you to include a new MSA tool into the pipeline. Once done, this will allow you to systematically deploy and benchmark your tool against all others included in the pipeline. You are also welcome to contribute back to the pipeline if you wish.
Expand Down
Binary file modified docs/images/nf-core-msa_metro_map.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
40 changes: 24 additions & 16 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,9 @@ The provided structures (see samplesheet) are used to evaluate the quality of th
Finally, a summary table with all the computed statistics and evaluations is reported in MultiQC (skip by using `--skip_multiqc`).
Moreover, a Shiny app is generated with interactive summary plots (skip with `--skip_shiny`).

> [!WARNING]
> You will need to have [Shiny](https://shiny.posit.co/py/) installed to run it! See [output documentation](https://nf-co.re/multiplesequencealign/output) for more info.
:::warning
You will need to have [Shiny](https://shiny.posit.co/py/) installed to run it! See [output documentation](https://nf-co.re/multiplesequencealign/output) for more info.
:::

## Samplesheet input

Expand All @@ -114,8 +115,9 @@ Each row represents a set of sequences (in this case the seatoxin and toxin prot
| `dependencies` | Required (At least one of fasta or dependencies must be provided). Full path to the folder that contains the dependency files (e.g. protein structures) for the sequences to be aligned. Currently, it is used for structural aligners and structure-based evaluation steps. It can be left empty. |
| `template` | Optional. Files that define the mapping between the input sequence and the dependency files (e.g. protein structures) to be used. Used by 3D-Coffee. If not specified, they will be automatically generated assuming that the sequence name provided in the fasta is the same as the file name of the corresponding PDB file. E.g. if you set (default) the parameter templates_suffix to .pdb, then: ">MyProteinName" in the fasta file and "MyProteinName.pdb" for the corresponding protein structure. For more information on how to generate a template file manually, please look at the T-Coffee [documentation](https://tcoffee.readthedocs.io/en/latest/tcoffee_main_documentation.html). |

> [!NOTE]
> You can have some samples with dependencies and/or references and some without. The pipeline will run the modules requiring dependencies/references only on the samples for which you have provided the required information and the others will be just skipped.
:::note
You can have some samples with dependencies and/or references and some without. The pipeline will run the modules requiring dependencies/references only on the samples for which you have provided the required information and the others will be just skipped.
:::

## Toolsheet input

Expand All @@ -132,11 +134,13 @@ FAMSA, -gt upgma -medoidtree, FAMSA,
FAMSA,,REGRESSIVE,
```

> [!NOTE]
> Each of the trees and aligners are available as standalones. You can leave `args_tree` and `args_aligner` empty if you are cool with the default settings of each method. Alternatively, you can leave `args_tree` empty to use the default guide tree with each aligner.
:::note
Each of the trees and aligners are available as standalones. You can leave `args_tree` and `args_aligner` empty if you are cool with the default settings of each method. Alternatively, you can leave `args_tree` empty to use the default guide tree with each aligner.
:::

> [!NOTE]
> use the exact spelling as listed above in [align](#3-align) and [guide trees](#2-guide-trees)!
:::note
use the exact spelling as listed above in [align](#3-align) and [guide trees](#2-guide-trees)!
:::

`tree` is the tool used to build the tree (optional).

Expand Down Expand Up @@ -176,8 +180,9 @@ If you wish to repeatedly use the same parameters for multiple runs, rather than

Pipeline settings can be provided in a `yaml` or `json` file via `-params-file <file>`.

> [!WARNING]
> Do not use `-c <file>` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process >resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or >module arguments (args).
:::warning
Do not use `-c <file>` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args).
:::

The above pipeline run specified with a params file in yaml format:

Expand Down Expand Up @@ -214,22 +219,25 @@ This version number will be logged in reports when you run the pipeline, so that

To further assist in reproducbility, you can use share and re-use [parameter files](#running-the-pipeline) to repeat pipeline runs with the same settings without having to write out a command with every single parameter.

> [!TIP]
> If you wish to share such profile (such as upload as supplementary material for academic publications), make sure to NOT include cluster specific paths to files, >nor institutional specific profiles.
:::tip
If you wish to share such profile (such as upload as supplementary material for academic publications), make sure to NOT include cluster specific paths to files, >nor institutional specific profiles.
:::

## Core Nextflow arguments

> [!NOTE]
> These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen).
:::tip
These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen).
:::

### `-profile`

Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments.

Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Apptainer, Conda) - see below.

> [!INFO]
> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.
:::info
We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.
:::

The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation).

Expand Down
2 changes: 1 addition & 1 deletion modules/nf-core/csvtk/join/csvtk-join.diff

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

This file was deleted.

6 changes: 1 addition & 5 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,6 @@
----------------------------------------------------------------------------------------
*/

plugins {
id 'nf-validation@0.3.1'
}

// Global default params, used in configs
params {

Expand Down Expand Up @@ -217,7 +213,7 @@ singularity.registry = 'quay.io'

// Nextflow plugins
plugins {
id 'nf-validation@1.1.3' // Validation of pipeline parameters and creation of an input channel from a sample sheet
id 'nf-validation@1.1.4' // Validation of pipeline parameters and creation of an input channel from a sample sheet
}


Expand Down
Loading
Loading