Updates for QC, tranches and optional EXIT-RIF gvcf dataset #92

abhi18av · 2022-03-21T17:00:50Z

DRAFT PR do not merge yet.

Updates

Move the FASTQC/MULTIQC checks to the QUALITY_CHECK_WF stage to catch data corruption earlier 👉 fdb959b
Update the readme to point to the website 👉 66cc26d
Update the containers to reuse the same conda.yml file 👉 03fa87e
Add the optional GVCF file from the reference EXIT-RIF dataset 👉 aa6ed01

NOTE: This added the requirement for git lfs install since the file is not downloaded properly without it. Normal Git repositories can't have large files without git lfs. For now, I've sourced that file via http but it can also be downloaded as part of this repo if git-lfs conda package is installed.

🚧 Use the tranches file for computing the best set of annotations (MOVED TO A DIFFERENT PR) Tranches optimization #95
Tweak for the pipeline logic regression due to the updated CSV format 👉 95d450e
Remove the dead code for TB_PROFILER_LOAD_LIBRARY (initially needed for previous version of tb-profiler) 👉 82cd641

Updated tasks after the meeting on 22-03-2022

Confirm if the gzip file is corrupted or not within the QC_CHECK workflow; confirm with samples sent via Lennert if FASTQC catches

The results of direct gzip -t $fq -v ( bad quality ERR779852_1.fastq.gz 🔴 )

(fastqc-env) PS /home/abhinav/projects/xbs-nf-dataset/bad-fastq-file-check> foreach ($fq in $listOfFastqs) {
>> gzip -t $fq -v
>> }
ERR751371_1.fastq.gz:
 OK
ERR751371_2.fastq.gz:
 OK
ERR779852_1.fastq.gz:

gzip: ERR779852_1.fastq.gz: unexpected end of file
ERR779852_2.fastq.gz:
 OK

Corresponding results of fastqc $fq (fails ✅ for the bad quality ERR779852_1.fastq.gz and passes for all others)


Approx 85% complete for ERR779852_1.fastq.gz
Approx 90% complete for ERR779852_1.fastq.gz
Approx 95% complete for ERR779852_1.fastq.gz
Failed to process file ERR779852_1.fastq.gz
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Ran out of data in the middle of a fastq entry.  Your file is probably truncated
    at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:179)
    at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
    at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77)
    at java.base/java.lang.Thread.run(Thread.java:834)

Test without the optional EXIT-RIF GVCF file 👉 Update the parameters and mechanism to use optional input files #94

Updated results_dirs #47 , added couple of notes and changed ```-a``` for LoFreq filtering to a more standard value.

fixed overzealous substitutions that often resulted in altered sample names

Also output a realigned bam file.

* add a viewer to prepare_cohort_workflow * add another view * reuse the TEST workflow * fix braces mismatch * test only till call workflow * disable other flows * test output of gatk_combine_gvcf as well * print the computed value of optional file * add optional logs and debug info * tweak the output log * remove print and log from combine process and reenable resistance analysis * test directly against the parameter * change the identifier for optional file * Completely remove the optional exit-rif file * update default parameter value to check if issue is due to overrides * remove minimal NF version req * add test profile * update the gitignore file * explicitely set the absent files as [] * add a dummy file * add dummy files * increase the test surface * WORKING - using dummy files * use the staged file within gatk combine process * WORKING after integration * simplify the usage of exit rif dataset * use the staged file in the process * simplify the user-interface for resistance_db parameter * enable entire workflow again * update the generation of file names * update the comments * update the maxForks for tbprofile-profile-lofreq to manage parallel who dataset on a cluster * update the file name to source it from local folder Co-authored-by: biosharp-ou <biosharp.ou@outlook.com>

abhi18av · 2022-03-28T19:25:01Z

@TimHHH , for some reason the changes introduced in GATK_VARIANTS_TO_TABLE 7f62d81 are causing an issue with the SNPSITES process -> Warning: No SNPs were detected so there is nothing to output.

Full error message below

Error executing process > 'MERGE_WF:PHYLOGENY_ANALYSIS__EXCOMPLEX:SNPSITES (joint)'

Caused by:
  Process `MERGE_WF:PHYLOGENY_ANALYSIS__EXCOMPLEX:SNPSITES (joint)` terminated with an error exit status (1)

Command executed:

  snp-sites -o joint.variable.ExDR.ExComplex.fa joint.ExDR.ExComplex.fa

Command exit status:
  1

Command output:
  (empty)

Command error:
  Warning: No SNPs were detected so there is nothing to output.

Work dir:
  /home/biosharpou/xbs-nf-runs/xbs-nf/work/1f/dd422d5f82f6428aa557b3a7cf27d5

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

TimHHH · 2022-03-29T14:52:22Z

@abhi18av I am not seeing any issue with the updated sed code, at least when I run it manually on some standard datasets. Could you have a look if the input VCF for these processes has any content? e.g. zcat joint.filtered_SNP.ExDR.IncComplex.vcf.gz | grep -v "#" should give some lines of data. Alternatively could you try again with a larger dataset (include EXIT-RIF)?

abhi18av added 4 commits March 21, 2022 17:52

Add the notes for tranches

951fde0

Merge branch 'master' into develop

ceb4ab1

accommodate the optional reference exit rif gvcf dataset

aa6ed01

derive the location of GVCF file

fe06d05

abhi18av changed the title ~~Updates for QC, tranches and optional EXIT-RIF samples~~ [DRAFT] Updates for QC, tranches and optional EXIT-RIF samples Mar 21, 2022

abhi18av added 11 commits March 21, 2022 19:49

Allow users to run only the QC_CHECK workflow to confirm sample quality

fdb959b

Update the readme to point to the website

66cc26d

update conda command and put the test workflow in the end

2ae9c00

use lfs for exit-rif dataset

f5a8da8

add exported and normal env files

641dfad

create a minimal conda env script

03fa87e

update the docker file

bb9a61c

update the container version

34aa675

updated the Dockerfiles

9b76c2a

update the docker container version

ad02075

fix docker file for container2

fc5ac6f

abhi18av changed the title ~~[DRAFT] Updates for QC, tranches and optional EXIT-RIF samples~~ [DRAFT] Updates for QC, tranches and optional EXIT-RIF gvcf dataset Mar 21, 2022

abhi18av added 8 commits March 21, 2022 22:16

fix conditional check

ddc8bff

add log to check execution path

6c807f2

adapt to the new quanttb_qc script

95d450e

fix typo in output channel

4c0d292

Remove the extra view check

bf85a14

Remove the tbload library process

82cd641

tweak the conda env files for general conda-forge channel

b1b13c8

update the optimized config

73db15f

source the exit-rif gvcf from HTTP

90bfc9a

abhi18av linked an issue Mar 21, 2022 that may be closed by this pull request

Retry the GATK_VARIANT_RECALIBRATOR with reduced gaussians on failure #73

Open

abhi18av removed a link to an issue Mar 21, 2022

Retry the GATK_VARIANT_RECALIBRATOR with reduced gaussians on failure #73

Open

abhi18av added 3 commits March 22, 2022 00:39

revert to project-local setups

9f4d47a

update for private repo

fb37a78

update the description and readme

f98b67c

abhi18av self-assigned this Mar 22, 2022

TimHHH added 4 commits March 28, 2022 13:57

Updated results_dirs

c001859

Updated results_dirs #47 , added couple of notes and changed ```-a``` for LoFreq filtering to a more standard value.

Update variants_to_table.nf

7f62d81

fixed overzealous substitutions that often resulted in altered sample names

Update haplotype_caller.nf

01af16e

Also output a realigned bam file.

Update haplotype_caller__minor_variants.nf

ade591a

Also output a realigned bam file.

abhi18av linked an issue Mar 28, 2022 that may be closed by this pull request

Define the publish dirs for all processes #47

Closed

abhi18av and others added 2 commits March 28, 2022 19:45

update the file pattern

e6c34f9

abhi18av changed the title ~~[DRAFT] Updates for QC, tranches and optional EXIT-RIF gvcf dataset~~ Updates for QC, tranches and optional EXIT-RIF gvcf dataset Mar 29, 2022

abhi18av merged commit fea3bc9 into master Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates for QC, tranches and optional EXIT-RIF gvcf dataset #92

Updates for QC, tranches and optional EXIT-RIF gvcf dataset #92

abhi18av commented Mar 21, 2022 •

edited

Loading

abhi18av commented Mar 28, 2022 •

edited

Loading

TimHHH commented Mar 29, 2022

Updates for QC, tranches and optional EXIT-RIF gvcf dataset #92

Updates for QC, tranches and optional EXIT-RIF gvcf dataset #92

Conversation

abhi18av commented Mar 21, 2022 • edited Loading

Updates

Updated tasks after the meeting on 22-03-2022

abhi18av commented Mar 28, 2022 • edited Loading

TimHHH commented Mar 29, 2022

abhi18av commented Mar 21, 2022 •

edited

Loading

abhi18av commented Mar 28, 2022 •

edited

Loading