Skip to content

Commit

Permalink
fast add
Browse files Browse the repository at this point in the history
  • Loading branch information
luisas committed Feb 3, 2025
1 parent 615a239 commit 4a5c98f
Show file tree
Hide file tree
Showing 5 changed files with 72 additions and 155 deletions.
2 changes: 1 addition & 1 deletion .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
lint:
files_exist:
- conf/igenomes.config
- conf/igenomes_ignored.config
- conf/igenomes_ignored.config
- .github/PULL_REQUEST_TEMPLATE.md
files_unchanged:
- .github/CONTRIBUTING.md
Expand Down
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,9 @@ nextflow run nf-core/multiplesequencealign \
## How to set up an easy run:

> [!NOTE]
>We have a lot more of use cases examples under [FAQs]("https://nf-co.re/multiplesequencealign/usage/FAQs)
> We have a lot more of use cases examples under [FAQs]("https://nf-co.re/multiplesequencealign/usage/FAQs)
> Find some example input data [here](https://github.com/nf-core/test-datasets/tree/multiplesequencealign)

### CASE 1: One input dataset, one tool.

If you only have one dataset and want align it using one specific MSA tool (e.g. FAMSA or FOLDMASON):
Expand Down
48 changes: 21 additions & 27 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ work # Directory containing the nextflow working files
We have a lot of use cases examples under [FAQs]("https://nf-co.re/multiplesequencealign/usage/FAQs)
:::


## Samplesheet input

The sample sheet defines the **input data** that the pipeline will process.
Expand Down Expand Up @@ -118,28 +117,25 @@ Currently available GUIDE TREE methods are: (Optional):
- [FAMSA](https://github.com/refresh-bio/FAMSA)
- [MAFFT](https://mafft.cbrc.jp/alignment/server/index.html)


Here some specific Guide Tree settings:
Use the values in columns `tree` and `args_tree`. The rest of the columns are just explainatory here.

| tree | args_Tree | Distance Measure | Core Algorithm | Speed-up Heuristic |
|-----------|-------------------------|------------------------------------------|--------------------------------------------|--------------------|
| MAFFT | | k-mer-based | UPGMA + single linkage combined | |
| MAFFT | --minimumlinkage | k-mer-based | single linkage | |
| MAFFT | --averagelinkage | k-mer-based | UPGMA | |
| MAFFT | --parttree | k-mer-based | single linkage + UPGMA combined | PartTree |
| MAFFT | --dpparttree | dynamic programming alignment-based | single linkage + UPGMA combined | PartTree |
| MAFFT | --fastaparttree | FASTA alignment-based | single linkage + UPGMA combined | PartTree |
| CLUSTALO | | sequence embedding + approx. alignment | UPGMA | bisecting K-means |
| FAMSA | | longest common subsequence-based | single linkage | |
| FAMSA | -gt upgma | longest common subsequence-based | UPGMA | |
| FAMSA | -gt nj | longest common subsequence-based | neighbour joining | |
| FAMSA | -parttree | longest common subsequence-based | single linkage | PartTree |
| FAMSA | -gt upgma -parttree | longest common subsequence-based | UPGMA | PartTree |
| FAMSA | -medoidtree | longest common subsequence-based | single linkage | MedoidTree |
| FAMSA | -gt upgma -medoidtree | longest common subsequence-based | UPGMA | MedoidTree |


Here some specific Guide Tree settings:
Use the values in columns `tree` and `args_tree`. The rest of the columns are just explainatory here.

| tree | args_Tree | Distance Measure | Core Algorithm | Speed-up Heuristic |
| -------- | --------------------- | -------------------------------------- | ------------------------------- | ------------------ |
| MAFFT | | k-mer-based | UPGMA + single linkage combined | |
| MAFFT | --minimumlinkage | k-mer-based | single linkage | |
| MAFFT | --averagelinkage | k-mer-based | UPGMA | |
| MAFFT | --parttree | k-mer-based | single linkage + UPGMA combined | PartTree |
| MAFFT | --dpparttree | dynamic programming alignment-based | single linkage + UPGMA combined | PartTree |
| MAFFT | --fastaparttree | FASTA alignment-based | single linkage + UPGMA combined | PartTree |
| CLUSTALO | | sequence embedding + approx. alignment | UPGMA | bisecting K-means |
| FAMSA | | longest common subsequence-based | single linkage | |
| FAMSA | -gt upgma | longest common subsequence-based | UPGMA | |
| FAMSA | -gt nj | longest common subsequence-based | neighbour joining | |
| FAMSA | -parttree | longest common subsequence-based | single linkage | PartTree |
| FAMSA | -gt upgma -parttree | longest common subsequence-based | UPGMA | PartTree |
| FAMSA | -medoidtree | longest common subsequence-based | single linkage | MedoidTree |
| FAMSA | -gt upgma -medoidtree | longest common subsequence-based | UPGMA | MedoidTree |

## 3. Align

Expand All @@ -150,17 +146,16 @@ The available assembly methods are listed below (those that accept guide trees i
- [CLUSTALO](http://clustal.org/omega/#Documentation) (accepts guide tree)
- [FAMSA](https://github.com/refresh-bio/FAMSA) (accepts guide tree)
- [KALIGN](https://github.com/TimoLassmann/kalign)
- [LEARNMSA](https://github.com/Gaius-Augustus/learnMSA) *Read note below
- [LEARNMSA](https://github.com/Gaius-Augustus/learnMSA) \*Read note below
- [MAFFT](https://mafft.cbrc.jp/alignment/server/index.html)
- [MAGUS](https://github.com/vlasmirnov/MAGUS) (accepts guide tree)
- [MUSCLE5](https://drive5.com/muscle5/manual/)
- [TCOFFEE](https://tcoffee.readthedocs.io/en/latest/index.html) (accepts guide tree)
- [REGRESSIVE](https://tcoffee.readthedocs.io/en/latest/tcoffee_quickstart_regressive.html) (accepts guide tree)
- [UPP](https://github.com/smirarab/sepp) (accepts guide tree)


> [!NOTE]
> LearnMSA can (and should) run on GPUs. If you have GPUs available please turn the GPU run mode on using `--use_gpu`. You might have to update you configuration file if you are running on a cluster with custom queue names. Check the [CRG](https://github.com/nf-core/configs/blob/master/conf/pipeline/multiplesequencealign/crg.config) one to see an example.
> LearnMSA can (and should) run on GPUs. If you have GPUs available please turn the GPU run mode on using `--use_gpu`. You might have to update you configuration file if you are running on a cluster with custom queue names. Check the [CRG](https://github.com/nf-core/configs/blob/master/conf/pipeline/multiplesequencealign/crg.config) one to see an example.
**sequence- and structure-based** (require both fasta and structures as input):

Expand Down Expand Up @@ -231,7 +226,6 @@ outdir: './results/'

You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).


### Updating the pipeline

When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
Expand Down
36 changes: 14 additions & 22 deletions docs/usage/FAQs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,21 @@

## TODO: replace main.nf with nf-core/multiplesequencealign and test.fa with <<YOUR_FASTA.fa>> AND ADD LINK


### INPUTS
### INPUTS

### USECASES


<details>
<summary> Where can I find some example input data? </summary>
Find some example input data <a href="https://github.com/nf-core/test-datasets/tree/multiplesequencealign">here</a>
</details>


<details>
<summary> I want to deploy one tool on one dataset. I am not interested in any evaluation, report etc. </summary>

You should use the easy_deploy profile!

This will skip all the evaluation, reporting etc. step and keep the deployment to the minimum.
This will skip all the evaluation, reporting etc. step and keep the deployment to the minimum.

The following example: running FAMSA (with arguments -refine_mode on) using the guidetree built using CLUSTALO.

Expand Down Expand Up @@ -50,53 +47,48 @@
--outdir results

You can leave the --tree and --args_aligner and --args_tree empty (just do not use the flags). Default values will be used.
Foldmason is just an example, you can pick any other structural aligner.
Foldmason is just an example, you can pick any other structural aligner.

</details>


<details>
<summary> One dataset, multiple tools. </summary>
You should use the <a href="https://nf-co.re/multiplesequencealign/usage/#toolsheet-input">toolsheet</a> to specify the tools use.
You should use the <a href="https://nf-co.re/multiplesequencealign/usage/#toolsheet-input">toolsheet</a> to specify the tools use.

nextflow run main.nf &\
-profile easy_deploy,docker \
--seqs <YOUR_PDB_DIR>\
--tools <YOUR_TOOLSHEET>\
--outdir results
--outdir results

Your input dataset can be passed via the --seqs or --pdbs_dir, as explained in the examples above.
Your input dataset can be passed via the --seqs or --pdbs_dir, as explained in the examples above.

</details>



<details>
<summary> Can i run the same tool multiple times with different arguments? </summary>

Absolutely yes! Create different rows in the toolsheet and add different arguments in the args_aligner column.

</details>


<details>
<summary> Can i run a structural evaluation on sequence-based aligners? </summary>

Yes, as long as you provide the structures, either via the samplesheet or via the --pdbs_dir flag.
Yes, as long as you provide the structures, either via the samplesheet or via the --pdbs_dir flag.

You can also run proteinfold before to get your structures, in case you do not have them already.
<a href="https://nf-co.re/multiplesequencealign/usage/#toolsheet-input">Here</a> instructions on how to do it.
You can also run proteinfold before to get your structures, in case you do not have them already.
<a href="https://nf-co.re/multiplesequencealign/usage/#toolsheet-input">Here</a> instructions on how to do it.
# ADD LINK
</details>


</details>

<details>
<summary> What happens if I have the only PDBs and not the corresponding fasta files? </summary>

No problem, you can provide the PDBs as input (either via the samplesheet using the optional_data column or via the flag --pdbs_dir).
The flag --skip_pdbcoversion false will make sure that the fasta file is automatically extracted from the provided PDBs and subsequently used in the pipeline.
No problem, you can provide the PDBs as input (either via the samplesheet using the optional_data column or via the flag --pdbs_dir).

The flag --skip_pdbcoversion false will make sure that the fasta file is automatically extracted from the provided PDBs and subsequently used in the pipeline.

nextflow run main.nf &\
-profile easy_deploy,docker \
Expand All @@ -105,5 +97,5 @@
--tree CLUSTALO \
--outdir results \
--skip_pdbconversion false
</details>

</details>
Loading

0 comments on commit 4a5c98f

Please sign in to comment.