-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates for QC, tranches and optional EXIT-RIF gvcf dataset #92
Conversation
Updated results_dirs #47 , added couple of notes and changed ```-a``` for LoFreq filtering to a more standard value.
fixed overzealous substitutions that often resulted in altered sample names
Also output a realigned bam file.
Also output a realigned bam file.
* add a viewer to prepare_cohort_workflow * add another view * reuse the TEST workflow * fix braces mismatch * test only till call workflow * disable other flows * test output of gatk_combine_gvcf as well * print the computed value of optional file * add optional logs and debug info * tweak the output log * remove print and log from combine process and reenable resistance analysis * test directly against the parameter * change the identifier for optional file * Completely remove the optional exit-rif file * update default parameter value to check if issue is due to overrides * remove minimal NF version req * add test profile * update the gitignore file * explicitely set the absent files as [] * add a dummy file * add dummy files * increase the test surface * WORKING - using dummy files * use the staged file within gatk combine process * WORKING after integration * simplify the usage of exit rif dataset * use the staged file in the process * simplify the user-interface for resistance_db parameter * enable entire workflow again * update the generation of file names * update the comments * update the maxForks for tbprofile-profile-lofreq to manage parallel who dataset on a cluster * update the file name to source it from local folder Co-authored-by: biosharp-ou <biosharp.ou@outlook.com>
@TimHHH , for some reason the changes introduced in Full error message below
|
@abhi18av I am not seeing any issue with the updated sed code, at least when I run it manually on some standard datasets. Could you have a look if the input VCF for these processes has any content? e.g. |
DRAFT PR do not merge yet.
Updates
QUALITY_CHECK_WF
stage to catch data corruption earlier 👉 fdb959bconda.yml
file 👉 03fa87eEXIT-RIF
dataset 👉 aa6ed01NOTE: This added the requirement for
git lfs install
since the file is not downloaded properly without it. Normal Git repositories can't have large files withoutgit lfs
. For now, I've sourced that file viahttp
but it can also be downloaded as part of this repo ifgit-lfs
conda package is installed.Updated tasks after the meeting on 22-03-2022
Confirm if the gzip file is corrupted or not within the QC_CHECK workflow; confirm with samples sent via Lennert if FASTQC catches
The results of direct
gzip -t $fq -v
( bad quality ERR779852_1.fastq.gz 🔴 )Corresponding results of
fastqc $fq
(fails ✅ for the bad quality ERR779852_1.fastq.gz and passes for all others)Test without the optional EXIT-RIF GVCF file 👉 Update the parameters and mechanism to use optional input files #94