-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bin QC Improvements #707
Bin QC Improvements #707
Conversation
- Update modules - Update integration in mag and with other tools (bin_summary, gtdb-tk) - Update test - Update schema
@nf-core-bot fix linting |
Before you continue (sorry this is a bit late): I generally don't like to deprecate old version of tools for a while, but rather keep them as alternative tools. In some cases people want to stick with the original version for compatibility with previous runs Could you 'revert' (or reinstall) the old checkm module and wrap it in an if/else statement (but within the subworkflow :) ) @muabnezor did a similar thing when adding porechop_ABI hree: #674 |
@jfy133 that makes sense, I will revert the CheckM removal. Bad for me for not asking before 😅. |
c63e084
to
b1b6518
Compare
Sorry, I didn't catch your last comment about including both tools in a single workflow. With that in mind, would make sense to include BUSCO as well, and just make a "bin_qc" subworkflow? |
Also, simplify bin_summary regarding bin qc
Yes that would be perfect! We need to subworkflow the sh*t out of this monster 😅 thank you!!! |
a4f42ef
to
da52285
Compare
4007932
to
0eb167a
Compare
It should be ready now. There is a last minor issue that should be solved by this PR: nf-core/modules#7119 |
Running the failed test one more time, the conoct test has failed a few times before too right? |
From the ci logs it seems to be lack of space in the runner. |
Oh, I realized something now. I totally forgot to skip Bin QC if |
Glad you worked it out! Unfortunately since yesterday now have both kids home sick so I will only have time to test next week again (I'm really really sorry for this, apparently it's being a really bad year in my city in the kindergartens and schools for viruses and stuff 😣) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more last comments, but otherwise all I'm doing for the rest of the day is running the tests!
One major thing missing now though: We've removed the standalone checkm test in the ci.yml
, but this has not been replaced by an alternative test, so now we have no checkM test... we should make sure all three are executed at least one across all our tests.
I suggest:
test.config
: run BUSCO (default, no change)test_adapterremoval.config
: run checkmtest_bbnorm.config
: run checkm2
Does that make sense?
if (params.save_busco_db) { | ||
// publish files downloaded by Busco | ||
ch_downloads = BUSCO.out.busco_downloads | ||
.groupTuple() | ||
.map { _lin, downloads -> downloads[0] } | ||
.toSortedList() | ||
.flatten() | ||
BUSCO_SAVE_DOWNLOAD(ch_downloads) | ||
|
||
ch_versions = ch_versions.mix(BUSCO_SAVE_DOWNLOAD.out.versions.first()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better description of second half of my old comment on this:
- This should go below the BUSCO module, given it takes the output of the
BUSCO
module - Do you see any reason why this cannot just be replaced with an extra
publishDir
entry inmodules.conf
for BUSCO where if --save_busco_db, then save the database files.
Checked on your PR vs dev:
Sorry this has taken so long, but the tests are taking around 25-30m each time 🙄 which doesn't help Ah I need to do one more manual check which is with GTDBTk which I'll have to do one a cluster tomorrow, as you've tweaked that slightly
But I'm pretty sure this is ready once my final comments above are addressed :) , mostly the missing addition of the CheckMs to the tests. |
c111681
to
3a38a37
Compare
Crap, BUSCO is borked everywhere isn't it 😢 |
Yeah, it's already reported here: https://gitlab.com/ezlab/busco/-/issues/776 and it seems that the only solution is to update. Today I have been looking at the code to update and migrate to the nf-core BUSCO module. Do you mind if I integrate those changes in this PR, or do you prefer a new one? For the meantime, can you give a review to the module update? nf-core/modules#7199 |
Self note: testing GTDBtk with following commands:
|
Let's do a new one, it's breaking small MEGAHIT fix PR, so if it's separate we can pull into that one and this one at the same time.
Will have a look now! EDIT: fixed! |
Old BUSCO files should be back now! And I'm waiting for my last GTDBTK related check but all the other runs looking good! |
That's good news. The BUSCO update PR ended up getting a bit large with the removal of all those local modules. |
@dialvarezs My manual tests with GTDBTk work! Just need the tests configs as in my comment above, but otherwise this is ready :D |
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.0.2. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
@jfy133 great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @dialvarezs ! This is huge work! |
There was a last minor bug when CheckM was not run for certain bins (specifically eukaryotic ones). I updated the condition to check if the CheckM bins are a subset of the depth bins, rather than requiring them to be equal. |
This PR adds:
BIN_QC
subworkflow, integrating CheckM, CheckM2, BUSCO and GUNCCloses #607.
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).