Skip to content

v0.5.0 - XA to XBC

Compare
Choose a tag to compare
@github-actions github-actions released this 03 Oct 22:02

v0.5.0

Please check out the v0.5.0 Testing Summary Package for a comprehensive report.

Notes

This is a minor release that includes the following changes:

  1. Detection of all recombinants in Nextclade dataset 2022-09-27: XA to XBC.
  2. Create any number of custom sc2rf modes with CLI arguments.

Resources

  • Issue #96: Create newick phylogeny of pango lineage parent child relationships, to get accurate sublineages including aliases.
  • Issue #118: Fix missing pango-designation issues for XAY and XBA.

Datasets

  • Issue #25: Reduce positive controls to one sequence per clade. Add new positive controls XAL, XAP, XAS, XAU, and XAZ.
  • Issue #92: Reduce negative controls to one sequence per clade. Add negative control for 22D (Omicron) / BA.2.75.
  • Issue #155: Add new profile and dataset controls-gisaid. Only a list of strains is provided, as GISAID policy prohibits public sharing of sequences and metadata.

Profile Creation

  • Issue #77: Report slurm command for --hpc profiles in scripts/create_profiles.sh.
  • Issue #153: Fix bug where build parameters metadata and sequences were not implemented.

Nextclade

  • Issue #81: Upgrade Nextclade datasets to 2022-09-27
  • Issue #91: Upgrade Nextclade to v2.5.0

sc2rf

  • Issue #78: Add new parameter max_breakpoint_len to sc2rf_recombinants to mark samples with two much uncertainty in the breakpoint interval as false positives.

  • Issue #79: Add new parameter min_consec_allele to sc2rf_recombinants to ignore recombinant regions with less than this number of consecutive alleles (both diagnostic SNPs and diganostic reference alleles).

  • Issue #80: Migrate sc2rf froma submodule to a subdirectory (including LICENSE!). This is to simplify the updating process and avoid errors where submodules became out of sync with the main pipeline.

  • Issue #83: Improve error handling in sc2rf_recombinants when the input stats files are empty.

  • Issue #89: Reduce the default value of the parameter min_len in sc2rf_recombinants from 1000 to 500.This is to handle XAP and XAJ.

  • Issue #90: Auto-pass select nextclade lineages through sc2rf: XN, XP, XAR, XAS, and XAZ. This requires differentiating the nextclade inputs as separate parameters --nextclade and --nextclade-no-recom.

    • XN,XP, and XAR have extremely small recombinant regions at the terminal ends of the genome. Depending on sequencing coverage, sc2rf may not reliably detect these lineages.

    • The newly designated XAS and XAZ pose a challenge for recombinant detection using diagnostic alleles. The first region of XAS could be either BA.5 or BA.4 based on subsitutions, but is mostly likely BA.5 based on deletions. Since the region contains no diagnostic alleles to discriminate BA.5 vs. BA.4, breakpoints cannot be detected by sc2rf.

    • Similarly for XAZ, the BA.2 segments do not contain any BA.2 diagnostic alleles, but instead are all reversion from BA.5 alleles. The BA.2 parent was discovered by deep, manual investigation in the corresponding pango-designation issue. Since the BA.2 regions contain no diagnostic for BA.2, breakpoints cannot be detected by sc2rf.

  • Issue #95: Generalize sc2rf_recombinants to take any number of ansi and csv input files. This allows greater flexibility in command-line arguments to sc2rf and are not locked into the hardcoded primary and secondary parameter sets.

  • Issue #96: Include sub-lineage proportions in the parents_lineage_confidence. This reduces underestimating the confidence of a parental lineage.

  • Issue #150: Fix bug where sc2rf would write empty output csvfiles if no recombinants were found.

  • Issue #151: Fix bug where samples that failed to align were missing from the linelists.

  • Issue #158: Reduce sc2rf param --max-intermission-length from 3 to 2 to be consistent with Issue #79.

  • Issue #161: Implement selection method to pick best results from various sc2rf modes.

  • Issue #162: Upgrade sc2rf/virus_properties.json.

  • Issue #163: Use LAPIS nextcladePangoLineage instead of pangoLineage. Also disable default filter max_breakpoint_len for XAN.

  • Issue #164: Fix bug where false positives would appear in the filter sc2rf ansi output (recombinants.ansi.txt).

  • The optional lapis parameter for sc2rf_recombinants has been removed. Querying LAPIS for parental lineages is no longer experimental and is now an essential component (cannot be disabled).

  • The mandatory mutation_threshold parameter for sc2rf has been removed. Instead, --mutation-threshold can be set independently in each of the scrf modes.

Linelist

  • Issue #157: Create new parameters min_lineage_size and min_private_muts to control lineage splitting into X*-like.

Plot

  • Issue #17: Create script to plot lineage assignment changes between versions using a Sankey diagram.
  • Issue #82: Change epiweek start from Monday to Sunday.
  • Issue #111: Fix breakpoint distribution axis that was empty for clade.
  • Issue #152: Fix file saving bug when largest lineage has / characters.

Report

  • Issue #88: Add pipeline and nextclade versions to powerpoint slides as footer. This required adding --summary as param to report.

Validate

  • Issue #56: Change rule validate from simply counting the number of positives to validating the fields lineage, breakpoints, parents_clade. This involves adding a new default parameter expected for rule validate in defaults/parameters.yaml.
Designated Lineages
Proposed Lineages

Commits

  • b48ad6d7 docs: fix CHANGELOG pr
  • 04b17918 docs: update readme and changelog
  • 72dd5a8f docs: add testing summary package for v0.4.2 to v0.5.0
  • 558f7d79 resources: fix breakpoints for XAE #122
  • 91e5843b script: bugfix sc2rf ansi output for #164
  • 9bc13639 docs: update issues and validation table order
  • b63520e5 script: implement lineage check in dups for #117 #161
  • 901898da sc2rf updates for #158 #161 #162 #163
  • 96fa6af1 dataset: update controls-gisaid strain list and validation
  • 84466a10 workflow: new param dup_method for #161
  • 9ca0c71e script: implement duplicate reconciliation for #161
  • 112ea684 param: upgrade nextclade dataset for #159
  • 859b92c8 script: add more detail to validate table for failing samples
  • 5e285912 script: add param --min-link-size to compare_positives
  • bd01a5e4 workflow: added failed validate output to rule log
  • 8e5b90fb workflow: don't use metadata for sc2rf_recombinants when exclude_negatives is true
  • cdf45407 param: add new params min-lineage-size and min-private-muts for #157
  • bc04fddf workflow: update validation strains for #155
  • 6aa95221 param: fix typo of missing --mutation-threshold
  • 25df848c param: remove param mutation_threshold as universal param for sc2rf
  • See CHANGELOG.md for additional commits.