Releases: moka-guys/automate_demultiplex
Releases · moka-guys/automate_demultiplex
V45.2.0
This update incorporates the following changes:
- Update mokapipe workflow ID and pipeline ID
- Update ED VCP3 panel of normals Rdata file
- Test emails redirected to mokaguys
- Run the docker images as the mokaguys user instead of root. This means that the files created are owned by mokaguys too, making them easier to delete with the workstation cleaner- currently this is failing because of user permissions issues
v45.1.0
This update incorporates the following changes:
- Update upload agent version to v 1.5.33 in line with the version used for archer archiving, as per DNAnexus recommendation
- Add functionality to process OncoDEEP runs (using the live OncoDEEP pan number), including oncodeep_upload v1.0.0 app
- Address jinja2 dependabot security alert – update jinja2 to v3.1.4
- Update naming of ‘master’ branch to ‘main’ - ‘main’ is the current widely accepted nomenclature for the primary branch
- Update oncology ops email used as the recipient for the pipeline started emails
- SQMove copying of the samplesheet from setoff_workflows module to demultiplexing module
- Add auth string variable assignment to top of all commands files – simplifies creation / writing of commands to file
- Remove TSO low throughput pan number (Pan4969)
- Update duty csv version to v1.5v.0
- Update samplesheet validator to v1.3.0
- Fix white space addition in SQL emails (Mail clients don’t always respect
<p>
html tags) - Update pipeline IDs (Custom Panels – ID in use was for a previous version of the pipeline, Archer – MultiQC was not listed in previous ID, TSO – previous ID had no information listed about the app versions, OncoDEEP – new pipeline)
- Set logger mode to 'w' so logging overwrites old logs
- Add functionality to process development runs without UMIs (demultiplex, then upload the runfolders including BCLs, and set off fastqc / multiqc)
- Add new development pan number for development runs with UMIs (Pan5227), and functionality to handle these (creates flag files to prevent demultiplexing and runfolder upload)
- Add UMI dev runfolder to config
- Addition to toolbox of RunfolderSamples class (derived from CollectRunfolderClass class in setoff_workflows) containing runfolder properties derived from the samplesheet, and SampleObject class originally in setoff_workflows
- Alter behaviour to prevent firing of md5sum absence warnings (as there is a period of time after sequencing finishes before the file appears)
- Fix so that SampleSheet is checked prior to the run finishing sequencing
- Fix issue where setoff_workflows falls over for TSO runs as the samplehseet has not been copied over to the runfolder from the samplesheets directory
- Fix issue where setoff workflows tries to automatically upload runs with the development pan number
- Make setoff_workflows more modular: (Creation of classes per run type for collating commands generation, to make the logic easier to follow, Move dx run command functions to new build_dx_commands script, Move PipelineEmails to pipeline_emails script)
- Remove Upload MultiQC as a dependency for duty_csv – addresses issue where duty_csv fails due to the cyber attack meaning that MultiQC reports cannot be uploaded to the genomics server i.e. upload multiqc fails
- Sort fastqs before validation so it is easier to see how far through we are
- Remove requirement to specify command line auth token for upload runfolder script – the script will only ever be run from the workstation, where the auth key file resides in the same location, so this makes it more user friendly
- Improve md5sum checking behaviour – If the runfolder requires no checksum checking (i.e. is a MiSeq), or the runfolder requires checksum checking and the checksum file exists: Check sequencing is complete If the sequencer requires no checksum checking (is a MiSeq) or the integrity check passes (checksum file exists and either has been checked before and contains the checksum success message, or has not been checked before in which case the check is carried out) If the above criteria are satisfied, processing continues
- Comment out addition of upload_multiqc command to dx run commands script
- Update requirements file to ensure it captures all dependencies
- Change missing fastqs logger message to warning
- Incorporate workstation cleaner, with the addition of checking whether the runfolder is a development runfolder to prevent any deletion of dev runs
- Fix LRPCR run not being identified as custom panels runtype by duty_csv, by adding the sample_prefix to the project name suffix
v45.0.0
Major overhaul:
- Refactor
- Prevent git_tag() masking derivative commit
- Remove email sending redundancy
- Reduce complexity of AdLogger and add logging to all modules
- Move scripts into own modules
- Improve naming of variables
- Apps and workflows specified by ID instead of strings – strings obtained by dx describe where necessary
- Remove obsolete scripts (/scripts subdir)
- Update for compatibility with Python 3
- Add license
- Add Pytest test suite and test data for tests
- Add GitHub actions testing for Pytest test suite and flake8 formatting
- Remove obsolete WES trio pan number - Pan3174
- Addition of SensitiveFormatter to logging
- Addition of more extensive logging
- Change paths and naming of logfiles
- Addition of a toolbox module containing functions that are used across multiple modules (includes new RunfolderObject() class which stores all runfolder attributes)
- Improved documentation – docstrings, addition of readmes to each module
- Improve readability of configuration file – use of dictionaries and per-module config classes
- Move panel config to a separate file and improved layout
- Incorporation of an email template and CSS style
- Addition of typing
- Incorporation of seglh-naming library via samplesheet_validator library
- Moving cluster density calculation from setoff workflows to demultiplex script
- Addition of script-level logfile that records decisions for which runs to process, and runfolder-level log files that record runfolder-level logs
- Remove obsolete function excluding MiSeq created fastqs
- Addition of backup runfolder script to the repository and ability to run as either a module import or on the command line
- Addition of job name string to allow specifying names for test folders
- Test folders are named 003_ in DNAnexus and shared with all binfx users with admin access
- Log messages run in test mode contain a TEST_MODE flag, and in Pytest mode contain a PYTEST_TESTS flag
- Standardise flags used in logging so that ERROR is the only thing that we are looking for to pick up
- Incorporate the correct dockerised bcl2fastq build
- Update duty_csv to v1.3.0, add qiagen_upload v1.0.0 app for TSO runs
- Split config up and move log messages into a log config file
- Removal of congenica_upload script to simplify generation of these commands
- Setoff_workflows script now checks that the expected fastqs are present against the samples in the SampleSheet, and that the expected samples in the SampleSheet are present in the BaseCalls dir, and that the undetermined fastqs are present. If expected sample fastqs are missing, it logs an error and excludes those fastqs from the run processing, sending out an error alert
- Move upload_runfolder logs to logs directory from DNAnexus_upload_started.txt file
- Add support for dev runs demultiplexing.py, so that the dev run is identified by the presence of the dev pan number in the SampleSheet by the SampleSheet validator, and the bcl2fastqlog file is added once the run is finished to prevent further processing by the scripts, and a warning message is sent out which is picked up by rapid7 to alert that the run needs manual processing
- Add command line support for dev runs – if runfolder name is provided on the command line and is a dev run, SampleSheet checks are bypassed
- Remove bcl2fastq log checking function – this is not required as the success or failure of bcl2fastq can be assessed by the script using the returncode
- Update runfolder name to append runtype to end of runfolder name for custom panels and WES runs (makes it easier to see what run it is e.g. in case of LRPCR)
- Grant seglh_read org access to all uploaded projects
- Add class and package diagrams
- Specify v2 instances for PIPE workflow for BWA, Picard, GATK, filter_vcf, Sambamba
- Increase FH GATK instance type to mem3_ssd1_v2_x16
- Enable setoff workflows script to handle missing fastqs
- Validate fastqs after demultiplexing using gzip –test. If any invalid fastqs exist, removes bcl2fastq log file to re-run demultiplexing
- Add sample names being processed to samples being processed email
- Introduce bash variables to store project name and ID in dx run scripts
- Add tagging to uploaded files to allow for correct counting, and error message when expected number of uploaded files does not match actual number of uploaded files
- Addition of a sleep command to each Qiagen upload command
- Remove bcl2fastq log upon demultiplex fastq validation fail to allow for re-attempt at demultiplexing
- Update CNV calling inputs for R134 (additional genes), R79 and R90 (fix single exon issue). Update readcount bed files and panel of normal files for VCP1 and VCP3
- Add settings.json
- Add demultiplex success and fail messages
- Addition of a samplesheet check flag file that prevents re-checking a samplesheet that has already been checked by the script but failed the checks
v.44.8.2
Updated Exomedepth apps and normal_readcount files
VCP3 exomedepth changes
- VCP3 CNV calling BED files
- fixed small error in upload script
v44.8.1
Bug fix- add new line when creating congenica run commands file
v44.8.0
v44.8.0 incorporates the following:
- The script now splits the TSO samplesheets and runs the pipeline multiple times, once per resulting split samplesheet
- Updated TSO app (v1.6.0) (AUD1352) which contains to support the app being run multiple times for the same run, and to output files in a useful way for downstream processes
- TSO post-run processing commands are now written to a separate bash script. This is because the --wait flag cannot be used to delay the running of the commands for the downstream apps when running multiple instances of the TSO app to process a single run
- Updated duty CSV app (v1.2.0) (AUD1349) that has been updated to function with the added exome depth PDF output and the altered TSO output format
- Add new pan numbers: Pan5186 and Pan5185 - APC Associated Polyposis, Pan5180 - development run (stops warning messages)
- Amend scripts so that samplesheet checks do not run for runs containing samples with the development pan number
- Incorporate new exome depth app which performs CNV calling using ExomeDepth - currently only running for VCP1 and VCP2 samples (not VCP3)
- Remove no longer required pan numbers: Pan4127 (VCP2 Viapath R209 (colorectal) and Pan4818 (VCP2 STG R209 colorectal) as R209 has been removed from the test directory. Pan4044 (STG VCP1), Pan4042 (STG VCP2 BRCA), Pan4049 (STG VCP2 CrCa), Pan4043 (STG VCP3) which were generic pan numbers we used to use for StG but have been separated into individual Pan numbers
v44.7.0
v44.7.0 incorporates the following changes:
- Update TSO500 coverage BED file to Pan5130
- Fix TSO coverage report output folders to output to a folder per Pan number
- Incorporate new MultiQC dnanexus app v1.18.0 to all pipelines (contains new MultiQC plugin for coverage that adds a coverage table for all samples with sambamba chanjo gene_level coverage files)
- Update TSO500 dependency so that MultiQC depends on sambamba chanjo jobs (except NTC samples)
- Switch to a dockerised version of bcl2fastq2
- Fix email function - add correct email server username back into the script
- Update incorrect RPKM VCP3 pan number (Pan3974 should be Pan4362)
- Remove obsolete MokaCAN pipeline
- Add new Pan numbers for R444.1 and R444.2
v44.6.0
This update incorporates the following changes:
- Re-add peddy to multiqc depends list
- Fix order of app dependency for TSO and Custom Panels pipelines (including addition of an extra dependency list for custom panels to stop multiqc depending on RPKM)
- Add updated duty_csv app version
- Remove non-required panel argument to fastqc command creation function
- Exclude NTC sambamba job from depends list for TSO samples
- Only add to depends_list if JOBID exists from the command for TSO fastqc sompy and sambamba
- Add extra log command for writing dx run commands to file
- Add support for R430 test indication (prostate panel) on VCP2
- Update VCP2 variant calling, coverage and RPKM bed files
- Add --priority flag to dx run commands that previously didn't have it
- Specify dnanexus v2 instance types for peddy, multiqc, upload multiqc, RPKM and congenica upload commands
- Increase timeout time on R134 runs from 6 to 12 hours
v44.5.0
- The addition of the duty_csv app to the end of all workflows. This consists of the addition of a new function to create the dx run command in the same way as for multiqc, and addition of config variables containing the inputs
- Alteration to the way the TSO500 pipeline is set off so that it no longer requires use of the output parser app. This facilitates easier updating of the pipeline, and brings it closer to the set up of the other pipelines, with set off using the dx run command bash script which contains all run commands, as opposed to the commands being split between the workstation and DNAnexus
- Addition of the --wait flag to the tso docker run command, to delay downstream tasks until all output files have been created
- Update the version of fastqc used for ArcherDX and TSO samples
- Update ADX and TSO pipeline IDs
v44.4.0 - minor release
This release includes the following changes:
- Update MokaPIPE workflow from version 2.17 to 2.18. This includes udpate of FastQC v1.3 → v1.4 (update fastqc version from v0.11.3 to v0.11.9 and dockerise), update of Picard v1.1 → v1.2 (Updated versions of samtools and picard, and made removal of chr in interval file optional), update of Filter_vcf_with_bedfile v1.0 → v1.1 (Add skip flag), and update of polyedge v1.0.0 → v1.1.0 (now outputs pdf, html and csv. Remove MSH2 variant hard coding)
- Update TSO500 app to v1.5.1
- Update Multiqc app to v1.17.0
- Add new Pan numbers for TSO500 and ArcherDx to support dry lab work
- Increased instance size for the MultiQC app