Releases · ktmeaton/ncov-recombinant

02 Mar 15:37

v0.7.0

400b11a

Latest

Notes

This is a minor release aimed towards a nextclade dataset upgrade from 2022-10-27 to 2023-01-09 which adds nomenclature for newly designated recombinants XBH - XBP. This release also adds initial support for the detection of "recursive recombination" including XBL and XBN which are recombinants of XBB.

A comprehensive test summary report can be downloaded directly with: ncov-recombinant_v0.6.1_v0.7.0.zip or viewed at the following link once the release is complete.

Documentation

Issue #24: Create documentation on Read The Docs

Dataset

Issue #210: Handle numeric strain names.

Resources

Issue #185: Simplify creation of the pango-lineage nomenclature phylogeny to use the lineage_notes.txt file and the pango_aliasor library.

sc2rf

Issue #195: Add bypass to intermission allele ratio for edge cases.
Issue #204: Add special handling for XBB sequenced with ARTIC v4.1 and dropout regions.
Issue #205: Add new column parents_conflict to indicate whether the reported lineages from covSPECTRUM conflict with the reported parental clades from `sc2rf.
Issue #213: Add XBK to auto-pass lineages.
Issue #222: Add new parameter --gisaid-access-key to sc2rf and sc2rf_recombinants.
Issue #229: Fix bug where auto-pass lineages are missing when exclude_negatives is set to true.
Issue #231: Fix bug where 'null' lineages in covSPECTRUM caused error in sc2rf postprocess.
The order of the postprocessing.py was rearranged to have more comprehensive details for auto-pass lineages.
Add XAN to auto-pass lineages.

Plot

Issue #209: Restrict the palette for rbd_level to the range of 0:12.
Issue #218: Fix bug concerning data fragmentation with large numbers of sequences.
Issue #221: Remove parameter --singletons in favor of --min-cluster-size to control cluster size in plots.
Issue #224: Fix bug where plot crashed with extremely large datasets.
Combine plot and plot_historical into one snakemake rule. Also at custom pattern plot_NX (ex. plot_N10) to adjust min cluster size.

Report

Combine report and report_historical into one snakemake rule.

Validate

Issue #225: Fix bug where false negatives passed validation because the status column wasn't checked.

Designated Lineages

Issue #217: XBB.1.5
Issue #196: XBF
Issue #206: XBG
Issue #196: XBH
Issue #199: XBJ
Issue #213: XBK
Issue #219: XBL
Issue #215: XBM
Issue #197: XBN

Proposed Lineages

Issue #203: proposed1305
Issue #208: proposed1340
Issue #212: proposed1425
Issue #214: proposed1440
Issue #216: proposed1444
Issue #220: proposed1576

Commits

c279f1e4 docs: add changelog for v0.7.0
2964b4a1 docs: update notes to include 1576 proposed issue
fdc874ab docs: add test summary package for v0.7.0
3f3d4438 docs: update docs v0.7.0
78696b36 script: add bug fix to sc2rf postprocess for #231
403777a0 script: lint plotting script
2a09c783 script: fix sc2rf postprocess bug in duplicate removal
d44d5f90 data: add XBP to controls-gisaid
4293439c profile: add controls-gisaid to virusseq builds
91d6fb89 defaults: update nextclade dataset to 2023-02-01
630b2cd5 resources: update
49e6f598 profile: add virusseq profile
7e586d1d script: add extra logic for auto-passing lineages
0ebe5e9c script: fix bug in report where it didn't check that plots existed
25b2f243 docs: update developers guide
914d933f defaults: add XBN to controls-gisaid and validation
8eaf08a9 data: restore controls-gisaid strain list
fa123009 script: defragment plot for 218
5f24f695 dataset: update controls-gisaid strain list
efc5aab7 defaults: update validation to fix XBH dropout
See CHANGELOG.md for additional commits.

Assets 2

08 Nov 20:45

github-actions

v0.6.1

83ee013

v0.6.1 - Network Stability and False Positives

v0.6.1

Notes

This is a minor bugfix release aimed towards resolving network connectivity errors and catching false positives.

sc2rf

Issue #195: Consider alleles outside of parental regions as intermissions (conflicts) to catch false positives.
Issue #201: Make LAPIS query of covSPECTRUM optional, to help with users with network connectivity issues. This can be set with the flag lapis: false in builds under the rule sc2rf_recombinants.
Issue #202: Document connection errors related to LAPIS and provide options for solutions.

Commits

83ee0139 docs: update changelog for v0.6.1
00fe2fc8 docs: update notes for v0.6.1
fa03ea96 workflow: fix bug where rbd_levels log was incorrectly named
a281b75c workflow: make lapis optional param for #201 #202
75684b55 docs: update docs
1085ce0e script: postprocess count alleles outside regions as intermissions for #195
c11770c1 param: add XAV to auto-pass for #104 #195

Assets 2

08 Nov 00:03

github-actions

v0.6.0

2506e90

v0.6.0 - Sublineages and Immunity

v0.6.0

Notes

This is a major release that includes the following changes:

Detection of all recombinants in Nextclade dataset 2022-10-27: XA to XBE.
Implementation of recombinant sublineages (ex. XBB.1).
Implementation of immune-related statistics (rbd_level, immune_escape, ace2_binding) from nextclade, the Nextstrain team, and Jesse Bloom's group:

Dataset

Issue #168: NULL collection dates and NULL country is implemented.
controls was updated to in include 1 strain from XBB for a total of 22 positive controls. The 28 negative controls were unchanged from v0.5.1.
controls-gisaid strain list was updated to include XA through to XBE for a total of 528 positive controls. This includes sublineages such as XBB.1 and XBB.1.2 which synchronizes with Nextclade Dataset 2022-10-19. The 187 negatives controls were unchanged from v0.5.1.

Nextclade

Issue #176: Upgrade Nextclade dataset to tag 2022-10-27 and upgrade Nextclade to v2.8.0.
Issue #193: Use the nextclade dataset sars-cov-2-21L to calculate immune_escape and ace2_binding.

RBD Levels

Issue #193: Create new rule rbd_levels to calculate the number of key receptor binding domain (RBD) mutations.

Lineage Tree

Issue #185: Use nextclade dataset Auspice tree for lineage hierarchy. Previously, the phylogeny of lineages was constructed from the cov-lineages website YAML. Instead, we now use the tree provided with nextclade datasets, to better synchronize the lineage model with the output.

Rather than creating the output tree in resources/lineages.nwk, the lineage tree will output to data/sars-cov-2_<DATE>/tree.nwk. This is because different builds might use different nextclade datasets, and so are dataset specific output.

sc2rf

Issue #179: Fix bug where sc2rf/recombinants.ansi.txt is truncated.
Issue #180: Fix recombinant sublineages (ex. XAY.1) missing their derived mutations in the cov-spectrum_query. Previously, the cov-spectrum_query mutations were only based on the parental alleles (before recombination). This led to sublinaeges (ex. XAY.1, XAY.2) all having the exact same query. Now, the cov-spectrum_query will include all substitutions shared between all sequences in the cluster_id.
Issue #187: Document bug that occurs if duplicate sequences are present, and the initial validation was skipped by not running scripts/create_profile.sh.
Issue #191 and Issue #192: Reduce false positives by ensuring that each mode of sc2rf has at least one additional parental population that serves as the alternative hypothesis.
Issue #195: Implement a filter on the ratio of intermissions to alleles. Sequences will be marked as false positives if the number of intermissions (i.e. alleles that conflict with the identified parental region) is greater than or equal to the number of alleles contributed by the minor parent. This ratio indicates that there is more evidence that conflicts with recombination than there is allele evidence that supports a recombinant origin.

Linelist

Issue #183: Recombinant sublineages. When nextclade calls a lineage (ex. XAY.1) which is a sublineage of a sc2rf lineage (XAY), we prioritize the nextclade assignment.
Issue #193: Add immune-related statistics: rbd_levels, rbd_substitutions, immune_escape, and ace2_binding.

Plot

Issue #57: Include substitutions within breakpoint intervals for breakpoint plots. This is a product of Issue #180 which provides access to all substitutions.
Issue #112: Fix bug where breakpoints plot image was out of bounds.
Issue #188: Remove the breakpoints distribution axis (ex. breakpoints_clade.png) in favor of putting the legend at the top. This significant reduces plotting issues (ex. Issue #112).
Issue #193: Create new plot rbd_level.

Validate

Designated Lineages

Issue #85: XAY, updated controls
Issue #178: XAY.1
Issue #172: XBB.1
Issue #175: XBB.1.1
Issue #184: XBB.1.2
Issue #173: XBB.2
Issue #174: XBB.3
Issue #181: XBC.1
Issue #182: XBC.2
Issue #171: XBD
Issue #177: XBE

Proposed Lineages

Issue #198: proposed1229
Issue #199: proposed1268
Issue #197: proposed1296

Commits

2506e907 docs: update changelog and add v0.6.0 testing summary package
0cc421e0 docs: update all contributors
cd9b6cbb resources: update issues
0fa2e3c1 docs: update readme
375c3a76 resources: add proposed lineages for #197 #198 #199
dad989e7 param: remove BQ.1 from sc2rf mode VOC as its too close to BA.5.3
d7cb005f docs: update issue template lineage-validation
1beac97e resources: add XBF to curated breakpoints for #196
fae7bfdb script: sc2rf implement intermission allele ratio for #195
89a41265 script: additional manual curation of lineage_tree
ebd3ce1f resources: update validation strains for controls-gisaid
d8bff572 script: add RBD Level slide to report
c1879c1d script: catch errors in rbd_level plotting with no recombinants
63545a08 script: fix bug in linelist with cluster_privates
c24a7179 resources: update issues
d32d557f docs: update development notes
7f825a41 script: manual fix for CK in lineage_tree
fdd6f66d workflow: implement rbd levels for #193
0058dd6e param: upgrade nextclade dataset to 2022-10-27 and reduce breakpoints of XA mode
fb062c32 env: upgrade nextclade to v2.8.0
See CHANGELOG.md for additional commits.

Assets 2

17 Oct 15:47

github-actions

v0.5.1

799904e

v0.5.1 - Hotfix

v0.5.1

This hotfix release fixes Issue #169 which was caused by an internal change in snakemake regarding dependencies. This was resolved by version controlling the tabulate package.

Notes

Workflow

Issue #169: AttributeError: 'str' object has no attribute 'name'

Resources

Issue #167: Alias key out of date, change source

Validate

Proposed Lineages

Issue #166: proposed1138
Issue #165: proposed1139

Commits

799904eb docs: update CHANGELOG for v0.5.1
1f9cd623 docs: update docs for v0.5.1
43fc4d71 env: update tabulate channel for #169
3b9c3796 env: version control tabulate for #169
5f57bca2 script: update alias url for #167
31647371 docs: update readme hpc section

Assets 2

03 Oct 22:02

github-actions

v0.5.0

b48ad6d

v0.5.0 - XA to XBC

v0.5.0

Please check out the v0.5.0 Testing Summary Package for a comprehensive report.

Notes

This is a minor release that includes the following changes:

Detection of all recombinants in Nextclade dataset 2022-09-27: XA to XBC.
Create any number of custom sc2rf modes with CLI arguments.

Resources

Issue #96: Create newick phylogeny of pango lineage parent child relationships, to get accurate sublineages including aliases.
Issue #118: Fix missing pango-designation issues for XAY and XBA.

Datasets

Issue #25: Reduce positive controls to one sequence per clade. Add new positive controls XAL, XAP, XAS, XAU, and XAZ.
Issue #92: Reduce negative controls to one sequence per clade. Add negative control for 22D (Omicron) / BA.2.75.
Issue #155: Add new profile and dataset controls-gisaid. Only a list of strains is provided, as GISAID policy prohibits public sharing of sequences and metadata.

Profile Creation

Issue #77: Report slurm command for --hpc profiles in scripts/create_profiles.sh.
Issue #153: Fix bug where build parameters metadata and sequences were not implemented.

Nextclade

Issue #81: Upgrade Nextclade datasets to 2022-09-27
Issue #91: Upgrade Nextclade to v2.5.0

sc2rf

Issue #78: Add new parameter max_breakpoint_len to sc2rf_recombinants to mark samples with two much uncertainty in the breakpoint interval as false positives.
Issue #79: Add new parameter min_consec_allele to sc2rf_recombinants to ignore recombinant regions with less than this number of consecutive alleles (both diagnostic SNPs and diganostic reference alleles).
Issue #80: Migrate sc2rf froma submodule to a subdirectory (including LICENSE!). This is to simplify the updating process and avoid errors where submodules became out of sync with the main pipeline.
Issue #83: Improve error handling in sc2rf_recombinants when the input stats files are empty.
Issue #89: Reduce the default value of the parameter min_len in sc2rf_recombinants from 1000 to 500.This is to handle XAP and XAJ.
Issue #90: Auto-pass select nextclade lineages through sc2rf: XN, XP, XAR, XAS, and XAZ. This requires differentiating the nextclade inputs as separate parameters --nextclade and --nextclade-no-recom.
- XN,XP, and XAR have extremely small recombinant regions at the terminal ends of the genome. Depending on sequencing coverage, sc2rf may not reliably detect these lineages.
- The newly designated XAS and XAZ pose a challenge for recombinant detection using diagnostic alleles. The first region of XAS could be either BA.5 or BA.4 based on subsitutions, but is mostly likely BA.5 based on deletions. Since the region contains no diagnostic alleles to discriminate BA.5 vs. BA.4, breakpoints cannot be detected by sc2rf.
- Similarly for XAZ, the BA.2 segments do not contain any BA.2 diagnostic alleles, but instead are all reversion from BA.5 alleles. The BA.2 parent was discovered by deep, manual investigation in the corresponding pango-designation issue. Since the BA.2 regions contain no diagnostic for BA.2, breakpoints cannot be detected by sc2rf.
Issue #95: Generalize sc2rf_recombinants to take any number of ansi and csv input files. This allows greater flexibility in command-line arguments to sc2rf and are not locked into the hardcoded primary and secondary parameter sets.
Issue #96: Include sub-lineage proportions in the parents_lineage_confidence. This reduces underestimating the confidence of a parental lineage.
Issue #150: Fix bug where sc2rf would write empty output csvfiles if no recombinants were found.
Issue #151: Fix bug where samples that failed to align were missing from the linelists.
Issue #158: Reduce sc2rf param --max-intermission-length from 3 to 2 to be consistent with Issue #79.
Issue #161: Implement selection method to pick best results from various sc2rf modes.
Issue #162: Upgrade sc2rf/virus_properties.json.
Issue #163: Use LAPIS nextcladePangoLineage instead of pangoLineage. Also disable default filter max_breakpoint_len for XAN.
Issue #164: Fix bug where false positives would appear in the filter sc2rf ansi output (recombinants.ansi.txt).
The optional lapis parameter for sc2rf_recombinants has been removed. Querying LAPIS for parental lineages is no longer experimental and is now an essential component (cannot be disabled).
The mandatory mutation_threshold parameter for sc2rf has been removed. Instead, --mutation-threshold can be set independently in each of the scrf modes.

Linelist

Issue #157: Create new parameters min_lineage_size and min_private_muts to control lineage splitting into X*-like.

Plot

Issue #17: Create script to plot lineage assignment changes between versions using a Sankey diagram.
Issue #82: Change epiweek start from Monday to Sunday.
Issue #111: Fix breakpoint distribution axis that was empty for clade.
Issue #152: Fix file saving bug when largest lineage has / characters.

Report

Issue #88: Add pipeline and nextclade versions to powerpoint slides as footer. This required adding --summary as param to report.

Validate

Issue #56: Change rule validate from simply counting the number of positives to validating the fields lineage, breakpoints, parents_clade. This involves adding a new default parameter expected for rule validate in defaults/parameters.yaml.

Designated Lineages

Issue #149: XA
Issue #148: XB
Issue #147: XC
Issue #146: XD
Issue #145: XE
Issue #144: XF
Issue #143: XG
Issue #141: XH
Issue #142: XJ
Issue #140: XK
Issue #139: XL
Issue #138: XM
Issue #137: XN
Issue #136: XP
Issue #135: XQ
Issue #134: XR
Issue #133: XS
Issue #132: XT
Issue #131: XU
Issue #130: XV
Issue #129: XW
Issue #128: XY
Issue #127: XZ
Issue #126: XAA
Issue #125: XAB
Issue #124: XAC
Issue #123: XAD
[Issue #122](https://github.com...

Assets 2

16 Aug 19:40

github-actions

v0.4.2

8953ef0

v0.4.2 - Bugfix and Enhancement

v0.4.2

Notes

This is a minor bug fix and enhancement release with the following changes:

Linelist

Issue #70: Fix missing sc2rf version from recombinant_classifier_dataset
Issue #74: Correctly identify XN-like and XP-like. Previously, these were just assigned XN/XP regardless of whether the estimated breakpoints conflicted with the curated ones.
Issue #76: Mark undesignated lineages with no matching sc2rf lineage as unpublished.

Plot

Issue #71: Only truncate cluster_id while plotting, not in table generation.
Issue #72: For all plots, truncate the legend labels to a set number of characters. The exception to this are parent labels (clade,lineage) because the full label is informative.
Issue #73, #75: For all plots except breakpoints, lineages will be defined by the column recombinant_lineage_curated. Previously it was defined by the combination of recombinant_lineage_curated and cluster_id, which made cluttered plots that were too difficult to interpret.
New parameter --lineage-col was added to scripts/plot_breakpoints.py to have more control on whether we want to plot the raw lineage (lineage) or the curated lineage (recombinant_lineage_curated).

Commits

8953ef03 docs: add CHANGELOG for v0.4.2
7ec5ccc6 docs: add notes for v0.4.2
1b3b1f1d script: restore column name to recombinant_classifer_dataset
901caf98 script: restore recombinant_lineage_curated of -like lineages
d6be9611 script: change internal delim of classifier for #70
cdb4a78a script: fix recombinant_classifier missing sc2rf for #70
bf7a4e57 script: mark undesignated lineages with no matching sc2rf lineage as unpublished for #76
46f6d754 workflow: update linelists and plotting for #74 and #75
c03dd3be script: don't split largest by cluster id for #73
e9802e79 script: majority of plots will not split by cluster_id for #73
bafb38fb script: fix cluster ID truncation for issue #71
ab712593 resources: curate and test breakpoints for proposed895

Assets 2

12 Aug 18:45

github-actions

v0.4.1

8865069

v0.4.1 - Bugfix

v0.4.1

Notes

This is a minor bug fix release with the following changes:

Issue #63: Remove usher and protobuf from the conda environment.
Issue #68: Remove ncov as a submodule.
Issue #69: Remove 22C and 22D from sc2rf/mapping.csv and sc2rf/virus_properties.json, as these interfere with breakpoint detection for XAN.

Commits

88650696 docs: add CHANGELOG for v0.4.1
00a2eec3 docs: add notes for v0.4.1
d74a81d3 sc2rf: revert 22C and 22D clade addition
7b662940 env: remove usher for issue #63
adf92399 submodule: remove ncov for issue #68
0790aa04 docs: update CHANGELOG for v0.4.0

Assets 2

11 Aug 22:24

github-actions

v0.4.0

c027027

v0.4.0 - BA.5 and UShER Removal

v0.4.0

Notes

General

v0.4.0 has been trained and validated on the latest generation of SARS-CoV-2 Omicron clades (ex. 22A/BA.4 and 22B/BA.5). Recombinant sequences involving BA.4 and BA.5 can now be detected, unlike in v0.3.0 where they were not included in the sc2rf models.

v0.4.0 is also a major update to how sequences are categorized into lineages/clusters. A recombinant lineage is now defined as a group of sequences with a unique combination of:

Lineage assignment (ex. XM)
Parental clades (ex. Omicron/21K,Omicron/21L)
Breakpoints (ex. 17411:21617)
NEW: Parental lineages (ex. BA.1.1,BA.2.12.1)

Novel recombinants (i.e. undesignated) can be identified by a lineage assignment that does not start with X* (ex. BA.1.1) or with a lineage assignment that contains -like (ex. XM-like). A cluster of sequences may be flagged as -like if one of the following criteria apply:

The lineage assignment by Nextclade conflicts with the published breakpoints for a designated lineage (resources/breakpoints.tsv).
- Ex. An XE assigned sample has breakpoint 11538:12879, which conflicts with the published XE breakpoint (ex. 8394:12879). This will be renamed XE-like.
The cluster has 10 or more sequences, which share at least 3 private mutations in common.
- Ex. A large cluster of sequences (N=50) are assigned XM. However, these 50 samples share 5 private mutations T2470C,C4586T,C9857T,C12085T,C26577G which do not appear in true XM sequences. These will be renamed XM-like. Upon further review of the reported matching pango-designation issues (460,757,781,472,798), we find this cluster to be a match to proposed798.

The ability to identify parental lineages and private mutations is largely due to improvements in the newly released nextclade datasets, , which have increased recombinant lineage accuracy. As novel recombinants can now be identified without the use of the custom UShER annotations (ex. proposed771), all UShER rules and output have been removed. This significantly improves runtime, and reduces the need to drop non-recombinant samples for performance. The result is more comparable output between different dataset sizes (4 samples vs. 400,000 samples).

Note! Default parameters have been updated! Please regenerate your profiles/builds with:
scripts/create_profile.sh --data data/custom

Datasets

Issue #49: The tutorial lineages were changed from XM,proposed467, miscBA1BA2Post17k, to XD, XH, XAN. The previous tutorial sequences had genome quality issues.
Issue #51: Add XAN to the controls dataset. This is BA.2/BA.5 recombinant.
Issue #62: Add XAK to the controls dataset. This is BA.2/BA.1 VUM recombinant monitored by the ECDC.

Nextclade

Issue #46: nextclade is now run twice. Once with the regular sars-cov-2 dataset and once with the sars-cov-2-no-recomb dataset. The sars-cov-2-no-recomb dataset is used to get the nucleotide substitutions before recombination occurred. These are identified by taking the substitutions column, and excluding the substitutions found in privateNucMutations.unlabeledSubstitutions. The pre-recombination substitutions allow us to identify the parental lineages by querying cov-spectrum.
Issue #48: Make the exclude_clades completely optional. Otherwise an error would be raised if the user didn't specify any.
Issue #50: Upgrade from v1.11.0 to v2.3.0. Also upgrade the default dataset tags to 2022-07-26T12:00:00Z which had significant bug fixes.
Issue #51: Relax the recombinant criteria, by flagging sequences with ANY labelled private mutations as a potential recombinant for further downstream analysis. This was specifically for BA.5 recombinants (ex. XAN) as no other columns from the nextclade output indicated this could be a recombinant.
Restrict nextclade output to fasta,tsv (alignment and QC table). This saves on file storage, as the other default output is not used.

sc2rf

Issue #51: sc2rf is now run twice. First, to detect recombination between clades (ex. Delta/21J & Omicron/21K). Second, to detect recombination within Omicron (ex. Omicron/BA.2/21L & Omicron/BA.5/22B). It was not possible to define universal parameters for sc2rf that worked for both distantly related clades, and the closely related Omicron lineages.
Issue #51: Rename parameter clades to primary_clades and add new parameter secondary_clades for detecting BA.5.
Issue #53: Identify the parental lineages by splitting up the observed mutations (from nextclade) into regions by breakpoint. Then query the list of mutations in https://cov-spectrum.org and report the lineage with the highest prevalence.
Tested out --enable-deletions again, which caused issues for XD. This confirms that using deletions is still ineffective for defining breakpoints.
Add B.1.631 and B.1.634 to sc2rf/mapping.tsv and as potential clades in the default parameters. These are parents for XB.
Add B.1.438.1 to sc2rf/mapping.tsv and as a otential clade in the default parameters. This is a parent for proposed808.
Require a recombinant region to have at least one substitution unique to the parent (i.e. diagnostic). This reduces false positives.
Remove the debugging mode, as it produced overly verbose output. It is more efficient to rerun manually with custom parameters tailored to the kind of debugging required.
Change parent clade nomenclature from Omicron/21K to the more comprehensive Omicron/BA.1/21K. This makes it clear which lineage is involved, since it's not always obvious how Nextclade clades map to pango lineages.

UShER

Issue #63: All UShER rules and output have been removed. First, because the latest releases of nextclade datasets (tag 2022-07-26T12:00:00Z) have dramatically improved lineage assignment accuracy for recombinants. Second, was to improve runtime and simplicity of the workflow, as UShER adds significantly to runtime.

Linelist

Issue #30: Fixed the bug where distinct recombinant lineages would occasionally be grouped into one cluster_id. This is due to the new definition for recombinant lineages (see General) section, which now includes parental lineages and have sufficient resolving power.
Issue #46: Added new column parents_subs, which are the substitutions found in the parental lineages before recombination occurred using the sars-cov-2-no-recomb nextclade dataset. Also added new columns: parents_lineage, parents_lineage_confidence, based on querying cov-spectrum for the substitutions found in parents_subs.
Issue #53: Added new column cov-spectrum_query which includes the substitutions that are shared by ALL sequences of the recombinant lineage.
Added new column cluster_privates which includes the private substitutions shared by ALL sequences of the recombinant lineage.
Renamed parents column to parents_clade, to differentiate it from the new column parents_lineage.

Plot

Issue #4, Issue #57: Plot distributions of each parent separately, rather than stacking on one axis. Also plot the substitutions as ticks on the breakpoints figure.

v0.3.0	v0.4.0

Issue #46: Plot breakpoints separately by clade and lineage. In addition, distinct clusters within the same recombinant lineage are noted by including their cluster ID as a suffix. As an example, please see XM (USA) and X (England) below. Where the lineage is the same (XM), but the breakpoints differ, as do the parental lineages (BA.2 vs BA.2.12.1). These clusters are distinct because XM (England) lacks substitutions occurring around position 20000.

| ...

Assets 2

21 Jun 17:12

github-actions

v0.3.0

2f8b498

v0.3.0 - No Recombinants, No Problems

v0.3.0

Notes

Major Changes

By default, all sequences will go through all steps of the pipeline. This prevents pipeline errors when no recombinant sequences are detected. See the FAQ for info on changing this setting.
Default parameters have been updated! Please regenerate your profiles/builds with:
```
scripts/create_profile.sh --data data/custom
```
Rule outputs are now in sub-directories for a cleaner results directory.
The in-text report (report.pptx) statistics are no longer cumulative counts of all sequences. Instead they, will match the reporting period in the accompanying plots.

Bug Fixes

Improve subtree collapse effiency (#35).
Improve subtree aesthetics and filters (#20).
Fix issues rendering as float (#29).
Explicitly control the dimensions of plots for powerpoint embedding.
Remove hard-coded extra_cols (#26).
Fix mismatch in lineages plot and description (#21).
Downstream steps no longer fail if there are no recombinant sequences (#7).

Output

Output new _historical plots and slides for plotting all data over time.
Output new file parents.tsv to summarize recombinant sequences by parent.
Order the colors/legend of the stacked bar plots by number of sequences.
Include lineage and cluster id in filepaths of largest plots and tables.
Rename the linelist output:
- linelist.tsv
- positives.tsv
- negatives.tsv
- false_positives.tsv
- lineages.tsv
- parents.tsv
The report.xlsx now includes the following tables:
- lineages
- parents
- linelist
- positives
- negatives
- false_positives
- summary
- issues

Data

Create new controls datasets:
- controls-negatives
- controls-positives
- controls
Add versions to genbank_accessions for controls.

Programs

Upgrade UShER to v0.5.4 (possibly this was done in a prior ver).
Remove taxonium and chronumental from the conda env.

Parameters

Add parameters to control whether negatives and false_positives should be excluded:
- exclude_negatives: false
- false_positives: false
Add new optional param max_placements to rule linelist.
Remove --show-private-mutations from debug_args of rule sc2rf.
Add optional param --sc2rf-dir to sc2rf to enable execution outside of sc2rf dir.
Add params --output-csv and --output-ansi to the wrapper scripts/sc2rf.sh.
Remove params nextclade_ref and custom_ref from rule nextclade.
Change --breakpoints 0-10 in sc2rf.

Workflow

Add new rule usher_columns to augment the base usher metadata.
Add new script parents.py, plots, and report slide to summarize recombinant sequences by parent.
Make rules plot and report more dynamic with regards to plots creation.
Exclude the reference genome from alignment until faToVcf.
Include the log path and expected outputs in the message for each rule.
Use sub-functions to better control optional parameters.
Make sure all rules write to a log if possible (#34).
Convert all rule inputs to snakemake rule variables.
Create and document a create_profile.sh script.
Implement the --low-memory mode parameter within the script usher_metadata.sh.

Continuous Integration

Re-rename tutorial action to pipeline, and add different jobs for different profiles:
- Tutorial
- Controls (Positive)
- Controls (Negative)
- Controls (All)

Pull Requests

pull/40 v0.3.0 stability update part 2
pull/8 Add XS and XQ to controls.
pull/19 docs: add lenaschimmel as a contributor for code
pull/12 Tutorial dataset and map panel for Auspice subtrees
pull/11 Add a tutorial profile
pull/14 Plots and PowerPoints
pull/15 New rule: parents
pull/39 v0.3.0 stability update

Commits

2f8b498a docs: update changelog for v0.3.0
0486d3be docs: add updating section to readme for issue #33
e8eda400 resources: updates issues with curate breakpoints
12e3700f bug: catch empty dataframe in plot
d1ccca2a workflow: first successful high-throughput run
cd741a10 workflow: add new rules plot_historical and report_historical
c2cc1380 env: remove openpyxl from environment
7dc7c039 workflow: remove rule report_redact #31
9ca5f822 script: rearrange merge file order in summary
aa28eb9f workflow: create new rule report_redact for #31
4748815d env: add openpyxl to environment for excel parsing in python
0060904a script: template duplicate labelling in usher_collapse for later
a82359a7 data: add accession versions to controls metadata
af7341aa data: add accession versions to controls metadata
d860a4c8 workflow: add new rule usher_columns to augment the base usher metadata
2511673d improve subtree collapse effiency (#35) and output aesthetics (#20)
1e81be3b bug: remove non-existant param --log in rule usher_metadata
02198b4c script: add logging to usher_collapse
d40d3d78 ci: don't run pipeline just for images changes
b880d9c8 docs: update powerpoint image to proper ver
See CHANGELOG.md for additional commits.

Assets 2

24 May 20:55

github-actions

v0.2.1

c2369c7

v0.2.1 - Plots and Powerpoints

v0.2.1

Notes

Params

New optional param motifs for rule sc2rf_recombinants.
New param weeks for new rule plot.
Removed prev_linelist param.

Output

Switch from a pdf report to powerpoint slides for better automation.
Create summary plots.
Split report rule into linelist and report.
Output svg plots.

Workflow

New rule plot.
Changed growth calculation from a comparison to the previous week to a score of sequences per day.
Assign a cluster_id according to the first sequence observed in the recombinant lineage.
Define a recombinant lineage as a group of sequences that share the same:
- Lineage assignment
- Parents
- Breakpoints or phylogenetic placement (subtree)
For some sequences, the breakpoints are inaccurate and shifted slightly due to ambiguous bases. These sequences can be assigned to their corresponding cluster because they belong to the same subtree.
For some lineages, global prevalence has exceeded 500 sequences (which is the subtree size used). Sequences of these lineages are split into different subtrees. However, they can be assigned to the correct cluster/lineage, because they have the same breakpoints.
Confirmed not to use deletions define recombinants and breakpoints (differs from published)?

Programs

Move sc2rf_recombinants.py to postprocess.py in ktmeaton fork of sc2rf.
Add false positives filtering to sc2rf_recombinants based on parents and breakpoints.

Docs

Add section Configuration to README.md.

Pull Requests

pull/14 Plots and PowerPoints

Commits

c2369c75 update CHANGELOG after README overhaul
9c8a774e update autologs to exclude first blank line in notes
2a8a7af5 overhaul README
9c2bd2f5 change asterisks to dashes
46d4ec81 update autologs to allow more complex notes content
a01a903c split docs into dev and todo
23e8d715 change color palette for plotting
785b8a19 add optional param motifs for sc2rf_recombinants
d1c1559e restore pptx template to regular view
6adc5d32 add seaborn to environment
35a04471 add changelog to report pptx
99e98aa7 add epiweeks to environment
1644b1fc add pptx report
1ab93aff (broken) start plotting
094530f0 swithc sc2rf to a postprocess script
02193d6e try generalizing sc2rf post-processing

Assets 2

Releases: ktmeaton/ncov-recombinant

v0.7.0 - Recursive Recombinants

Notes

Documentation

Dataset

Resources

sc2rf

Plot

Report

Validate

Designated Lineages

Proposed Lineages

Commits

v0.6.1 - Network Stability and False Positives

v0.6.1

Notes

sc2rf

Commits

v0.6.0 - Sublineages and Immunity

v0.6.0

Notes

Dataset

Nextclade

RBD Levels

Lineage Tree

sc2rf

Linelist

Plot

Validate

Designated Lineages

Proposed Lineages

Commits

v0.5.1 - Hotfix

v0.5.1

Notes

Workflow

Resources

Validate

Proposed Lineages

Commits

v0.5.0 - XA to XBC

v0.5.0

Notes

Resources

Datasets

Profile Creation

Nextclade

sc2rf

Linelist

Plot

Report

Validate

Designated Lineages

v0.4.2 - Bugfix and Enhancement

v0.4.2

Notes

Linelist

Plot

Commits

v0.4.1 - Bugfix

v0.4.1

Notes

Commits

v0.4.0 - BA.5 and UShER Removal

v0.4.0

Notes

General

Datasets

Nextclade

sc2rf

UShER

Linelist

Plot

v0.3.0 - No Recombinants, No Problems

v0.3.0

Notes

Major Changes

Bug Fixes

Output

Data