Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dsl2: metagenomics uncollapsed paired end #1098

Open
wants to merge 19 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
ab9a367
partial fix for krakenuniq, error within kraken call for PE (thinks f…
ilight1542 Sep 13, 2024
6924787
notes for fixes going forward
ilight1542 Oct 4, 2024
376a639
added warning, and correct parsing for input into metagenomic screeni…
ilight1542 Oct 18, 2024
d8c8c30
full implementation of paired end metagenomics krakenuniq
ilight1542 Oct 25, 2024
b10d16b
updated warn and comments
ilight1542 Oct 25, 2024
b8239c7
updated error catching, tested profiling with paired end inputs
ilight1542 Nov 15, 2024
a917953
added tags for log, updated warns/errors
ilight1542 Nov 15, 2024
49f36d2
samtools fastq map name with all when bamfiltering merging all files …
ilight1542 Jan 24, 2025
c38886c
adjusted fastq generation for input into metagenomics - bugfix
ilight1542 Jan 24, 2025
0bf439b
reduced unnecssary words
ilight1542 Jan 24, 2025
b693741
Merge remote-tracking branch 'origin/dev' into
ilight1542 Jan 31, 2025
ed3d93c
removed view and added useful module tag for runtime
ilight1542 Jan 31, 2025
a946fca
adjustment needed for krakenuniq
ilight1542 Feb 21, 2025
b2ef998
for manual tests
ilight1542 Feb 21, 2025
5c77fa8
debugging view commands
ilight1542 Feb 21, 2025
9ac4c42
adjusted tag for correct parsing SEvsPE krakenuniq
ilight1542 Feb 28, 2025
c1bab1d
removed print statemtns
ilight1542 Feb 28, 2025
f0cc46b
Merge remote-tracking branch 'origin/dev' into metagenomics-pairedend
ilight1542 Feb 28, 2025
7c30833
adjusted multiple module imports into single module import for bamfil…
ilight1542 Mar 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 153 additions & 21 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -447,29 +447,14 @@ process {
]
}

withName: SAMTOOLS_FASTQ_MAPPED {
withName: SAMTOOLS_FASTQ_METAGENOMICS {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.args = [
params.metagenomics_input == 'all' ? '' : '-F 4',
params.metagenomics_input == 'mapped' ? '-F 4': '',
params.metagenomics_input == 'unmapped' ? '-f 4': '',
// 'all' is left then with NO -F or -f flag, therefore all reads get sent to fastq
].join(' ').trim()
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}_mapped" }
publishDir = [
[
// data
path: { "${params.outdir}/read_filtering/fastq/data/" },
mode: params.publish_dir_mode,
pattern: '*.fastq.gz',
enabled: params.bamfiltering_generatemappedfastq
]
]
}

withName: SAMTOOLS_FASTQ_UNMAPPED {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.args = [
'-f 4',
].join(' ').trim()
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}_unmapped" }
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}_metagenomics_fastq_${params.metagenomics_input}" }
publishDir = [
[
// data
Expand All @@ -481,7 +466,7 @@ process {
]
}

withName: 'CAT_FASTQ_UNMAPPED|CAT_FASTQ_MAPPED' {
withName: 'CAT_FASTQ_METAGENOMICS' {
tag = { "${meta.sample_id}_${meta.library_id}_${meta.reference}" }
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}" }
publishDir = [
Expand Down Expand Up @@ -926,6 +911,153 @@ process {
]
}

withName: BBMAP_BBDUK {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.args = { "entropymask=f entropy=${params.metagenomics_complexity_entropy}" }
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}_complexity" }
publishDir = [
path: { "${params.outdir}/metagenomics/complexity_filter/bbduk/" },
mode: params.publish_dir_mode,
pattern: '*.{fastq.gz,log}',
enabled: params.metagenomics_complexity_savefastq
]
}

withName: MALT_RUN {
ext.args = [
"-m ${params.metagenomics_malt_mode}",
"-at ${params.metagenomics_malt_alignmentmode}",
"-top ${params.metagenomics_malt_toppercent}",
"-id ${params.metagenomics_malt_minpercentidentity}",
"-mq ${params.metagenomics_malt_maxqueries}",
"--memoryMode ${params.metagenomics_malt_memorymode}",
params.metagenomics_malt_minsupportmode == "percent" ? "-supp ${params.metagenomics_malt_minsupportpercent}" : "-sup ${params.metagenomics_malt_minsupportreads}",
params.metagenomics_malt_savereads ? "--alignments ./" : ""
].join(' ').trim()
publishDir = [
path: { "${params.outdir}/metagenomics/profiling/malt/" },
mode: params.publish_dir_mode,
pattern: '*.{rma6,log,sam.gz}'
]
ext.prefix = { "${meta.label}_${meta.id}-run" }
}

withName: CAT_CAT_MALT {
ext.prefix = { "${meta.id}_runtime_log_concatenated.log" }
publishDir = [
path: { "${params.outdir}/metagenomics/profiling/malt/" },
mode: params.publish_dir_mode,
pattern: '*.{log}'
]
}

withName: KRAKEN2_KRAKEN2 {
tag = { "${meta.sample_id}|single_end_mode_${meta.single_end}" }
ext.args = [
params.metagenomics_kraken2_saveminimizers ? "--report-minimizer-data" : ""
].join(' ').trim()
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/metagenomics/profiling/kraken2/" },
mode: params.publish_dir_mode,
pattern: '*.{txt,fastq.gz}'
]
}

withName: KRAKENUNIQ_PRELOADEDKRAKENUNIQ {
tag = { "single_end_mode_${meta.single_end}" }
publishDir = [
path: { "${params.outdir}/metagenomics/profiling/krakenuniq/" },
mode: params.publish_dir_mode,
pattern: '*.{txt,fastq.gz}'
]
ext.prefix = { "${meta.single_end}" }
}

withName: METAPHLAN_METAPHLAN {
publishDir = [
path: { "${params.outdir}/metagenomics/profiling/metaphlan/" },
mode: params.publish_dir_mode,
pattern: '*.{biom,txt}'
]
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}" }
}

withName: MALTEXTRACT {
ext.args = [
"-f ${params.metagenomics_maltextract_filter}",
"-a ${params.metagenomics_maltextract_toppercent}",
"--minPI ${params.metagenomics_maltextract_minpercentidentity}",
params.metagenomics_maltextract_destackingoff ? "--destackingOff" : "",
params.metagenomics_maltextract_downsamplingoff ? "--downSampOff" : "",
params.metagenomics_maltextract_duplicateremovaloff ? "--dupRemOff" : "",
params.metagenomics_maltextract_matches ? "--matches" : "",
params.metagenomics_maltextract_megansummary ? "--meganSummary" : "",
params.metagenomics_maltextract_usetopalignment ? "--useTopAlignment" : "",
{ meta.strandedness } == "single" ? '--singleStranded' : '',
].join(' ').trim()
publishDir = [
path: { "${params.outdir}/metagenomics/postprocessing/maltextract/" },
mode: params.publish_dir_mode,
pattern: 'results',
saveAs: { "${meta.id}" }
]
}

withName: MEGAN_RMA2INFO {
tag = {"${meta.id}"}
ext.args = "-c2c Taxonomy"
ext.prefix = { "${meta.id}" }
publishDir = [
path: { "${params.outdir}/metagenomics/postprocessing/megan_summaries/" },
mode: params.publish_dir_mode,
pattern: '*.{txt.gz,megan}'
]
}

withName: AMPS {
publishDir = [
path: { "${params.outdir}/metagenomics/postprocessing/maltextract/" },
mode: params.publish_dir_mode,
pattern: 'results'
]
errorStrategy = 'ignore' // required as it fails the run for low reads: https://github.com/rhuebler/HOPS/issues/9
}

withName: TAXPASTA_MERGE {
publishDir = [
path: { "${params.outdir}/metagenomics/postprocessing/taxpasta/" },
mode: params.publish_dir_mode,
pattern: '*.{csv,tsv,ods,xlsx,arrow,parquet,biom}'
]
ext.args = { "--profiler ${meta.profiler} --output ${meta.profiler}_taxpasta_table.tsv" }
}

withName: TAXPASTA_STANDARDISE {
publishDir = [
path: { "${params.outdir}/metagenomics/postprocessing/taxpasta/" },
mode: params.publish_dir_mode,
pattern: '*.{csv,tsv,ods,xlsx,arrow,parquet,biom}'
]
ext.args = { "--profiler ${meta.profiler} --output ${meta.profiler}taxpasta_table.tsv" }
}

//
// QUALIMAP
//

withName: 'QUALIMAP_BAMQC_WITHBED|QUALIMAP_BAMQC_NOBED' {
tag = { "${meta.reference}|${meta.sample_id}" }
publishDir = [
path: { "${params.outdir}/mapstats/qualimap/${meta.reference}/" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

//
// DAMAGE CALCULATION
//
withName: DAMAGEPROFILER {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.args = [
Expand Down
12 changes: 11 additions & 1 deletion docs/development/manual_tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,7 @@ HOP001 ERR8958750 0 4 paired double half /workspace/eager/testing/test_data/ERR8
HOP001 ERR8958751 0 2 paired double half /workspace/eager/testing/test_data/ERR8958751_1.fastq.gz_reduced.fastq.gz /workspace/eager/testing/test_data/ERR8958751_2.fastq.gz_reduced.fastq.gz NA NA
HOP001 ERR8958752 0 2 paired double half /workspace/eager/testing/test_data/ERR8958752_1.fastq.gz_reduced.fastq.gz /workspace/eager/testing/test_data/ERR8958752_2.fastq.gz_reduced.fastq.gz NA NA
HOP001 ERR8958753 0 2 paired double half /workspace/eager/testing/test_data/ERR8958753_1.fastq.gz_reduced.fastq.gz /workspace/eager/testing/test_data/ERR8958753_2.fastq.gz_reduced.fastq.gz NA NA
HOP001 ERR8958754 0 2 paired double none /workspace/eager/testing/test_data/ERR8958754_1.fastq.gz_reduced.fastq.gz /workspace/eager/testing/test_data/ERR8958754_2.fastq.gz_reduced.fastq.gz NA NA" | sed 's/ /\t/g' > test.tsv
HOP001 ERR8958754 0 2 paired double none /workspace/eager/testing/test_data/ERR8958754_1.fastq.gz_reduced.fastq.gz /workspace/eager/testing/test_data/ERR8958754_2.fastq.gz_reduced.fastq.gz NA NA" | sed 's/NA/ /g' | sed 's/ /\t/g' > test.tsv

nextflow run ../main.nf -profile docker \
--input test.tsv \
Expand All @@ -738,6 +738,16 @@ nextflow run ../main.nf -profile docker \
--metagenomics_malt_group_size 3
```

# kraken2

nextflow run main.nf -profile docker \
--input testing/test.tsv \
--outdir ./out \
--run_metagenomics \
--metagenomics_profiling_tool kraken2 \
--metagenomics_profiling_database /workspace/eager/testing/eager_test.tar.gz
--preprocessing_skippairmerging

## Mapping statistics

### ENDOSPY
Expand Down
18 changes: 18 additions & 0 deletions docs/development/metagenomics_paired_end.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## investigation notes for updating code to allow for PE inputs into metagenomics profiling (eg for kraken, malt)

see
https://github.com/nf-core/eager/issues/945

current issue is that the reads that go into mapping are not by default extracted as singletons and non-singletons, so we lose that information
Then downstream the inputs into the krakenuniq module (even if split correctly with meta vars) don't have the correct headers to parse the PE nature of the reads (since they have all been concatenated anyways, and just were ORIGINALLY PE)

So: needs to be fixed up higher (eg in bamfiltering.nf, likely with a new adjustment to the SAMTOOLS_FASTQ_UNMAPPED, SAMTOOLS_FASTQ_MAPPED, and SAMTOOLS_VIEW_BAM_FILTERING modules )

ISSUE FOUND: while the outputting of PE reads is OK in bamfiltering.nf (fastq_mapped & fastq_unmapped) when overlap merging is not done cat_fastq weirdly merges singletons to one PE file and other to the other PE file, so then everything gets fucked up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a hackathon-thing to do

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me double check this behavior and add more context of what we might need to do in a hackathon... i this was more for my own notes reference but was not edited prior to opening the PR.

"""
cat input1/JK2782_JK2782_TGGCCGATCAACGA_Mammoth_MT_Krause_unmapped_other.fastq.gz input3/JK2782_JK2782_TGGCCGATCAACGA_Mammoth_MT_Krause_unmapped_1.fastq.gz > JK2782_JK2782_TGGCCGATCAACGA_Mammoth_MT_Krause_1.merged.fastq.gz
cat input2/JK2782_JK2782_TGGCCGATCAACGA_Mammoth_MT_Krause_unmapped_singleton.fastq.gz input4/JK2782_JK2782_TGGCCGATCAACGA_Mammoth_MT_Krause_unmapped_2.fastq.gz > JK2782_JK2782_TGGCCGATCAACGA_Mammoth_MT_Krause_2.merged.fastq.gz
"""

Decision is needed on what behavior is wanted for unmapped singletons, other. and then likely remove the call to cat_fastq for PE reads
Possibly just split to also have the singletons parsed separately?
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@
},
"krakenuniq/preloadedkrakenuniq": {
"branch": "master",
"git_sha": "a6eb17f65b3ee5761c25c075a6166c9f76733cee",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"malt/run": {
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading