updated README

HKU-BAL · Nov 29, 2024 · c71a635 · c71a635
1 parent 46a9ae4
commit c71a635
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/README.md b/README.md
@@ -52,7 +52,7 @@ The latest performance figures as of Oct. 10th, 2024 (ClairS-TO v0.3.0) is avail
 
 *v0.3.1 (Nov. 29, 2024)* : Added `ssrs` model for PacBio Revio (`hifi_revio_ssrs`) and Illumina (`ilmn_ssrs`) platforms.
 
-*v0.3.0 (Oct. 11, 2024)* : This version is a major update. The new features and benchmarks are explained in a technical note titled [“Improving the performance of ClairS and ClairS-TO with new real cancer cell-line datasets and PoN”](https://github.com/HKU-BAL/ClairS/blob/main/docs/Improving_the_performance_of_ClairS_and_ClairS-TO_with_new_real_cancer_cell-line_datasets_and_PoN.pdf). A summary of changes: 1. Starting from this version, ClairS-TO will provide two model types. `ssrs` is a model trained initially with synthetic samples and then real samples augmented (e.g., `ont_r10_dorado_sup_5khz_ssrs`), `ss` is a model trained from synthetic samples (e.g., `ont_r10_dorado_sup_5khz_ss`). The `ssrs` model provides better performance and fits most usage scenarios. `ss` model can be used when missing a cancer-type in model training is a concern. In v0.3.0, four real cancer cell-line datasets (HCC1937, HCC1954, H1437, and H2009) covering two cancer types (breast cancer, lung cancer) published by [Park et al.](https://www.biorxiv.org/content/10.1101/2024.08.16.608331v1) were used for `ssrs` model training. 2. Added using [CoLoRSdb](https://zenodo.org/records/13145123) (Consortium of Long Read Sequencing Database) as a PoN for tagging non-somatic variant. The idea was inspired by [Park et al., 2024](https://www.biorxiv.org/content/10.1101/2024.08.16.608331v1). The F1-score improved by ~10-20% for both SNV and Indel by using CoLoRSdb. 3. Added tagging indels at sequence with low entropy as `LowSeqEntropy`. 4. Added the `--indel_min_af` option and adjusted the default minimum allelic fraction requirement to 0.1 for Indels in ONT platform. 5. Removed limiting Indel calling to only confident and necessary regions (whole genome - GIAB stratification v3.3 all difficult regions + CMRG v1.0 regions). The practice was started with in v0.1.0, and is deemed unnecessary and removed in v0.4.0. User can use `--calling_indels_only_in_these_regions` option to specify Indel calling regions.
+*v0.3.0 (Oct. 11, 2024)* : This version is a major update. The new features and benchmarks are explained in a technical note titled [“Improving the performance of ClairS and ClairS-TO with new real cancer cell-line datasets and PoN”](https://github.com/HKU-BAL/ClairS/blob/main/docs/Improving_the_performance_of_ClairS_and_ClairS-TO_with_new_real_cancer_cell-line_datasets_and_PoN.pdf). A summary of changes: 1. Starting from this version, ClairS-TO will provide two model types. `ssrs` is a model trained initially with synthetic samples and then real samples augmented (e.g., `ont_r10_dorado_sup_5khz_ssrs`), `ss` is a model trained from synthetic samples (e.g., `ont_r10_dorado_sup_5khz_ss`). The `ssrs` model provides better performance and fits most usage scenarios. `ss` model can be used when missing a cancer-type in model training is a concern. In v0.3.0, four real cancer cell-line datasets (HCC1937, HCC1954, H1437, and H2009) covering two cancer types (breast cancer, lung cancer) published by [Park et al.](https://www.biorxiv.org/content/10.1101/2024.08.16.608331v1) were used for `ssrs` model training. 2. Added using [CoLoRSdb](https://zenodo.org/records/13145123) (Consortium of Long Read Sequencing Database) as a PoN for tagging non-somatic variant. The idea was inspired by [Park et al., 2024](https://www.biorxiv.org/content/10.1101/2024.08.16.608331v1). The F1-score improved by ~10-20% for both SNV and Indel by using CoLoRSdb. 3. Added tagging indels at sequence with low entropy as `LowSeqEntropy`. 4. Added the `--indel_min_af` option and adjusted the default minimum allelic fraction requirement to 0.1 for Indels in ONT platform. 5. Removed limiting Indel calling to only confident and necessary regions (whole genome - GIAB stratification v3.3 all difficult regions + CMRG v1.0 regions). The practice was started in v0.1.0, and is deemed unnecessary and removed in v0.3.0. User can use `--calling_indels_only_in_these_regions` option to specify Indel calling regions.
 
 *v0.2.0 (Jul. 12, 2024)*: 1. Added a module called `verdict` to statistically classify a called variant into either a germline, somatic, or subclonal somatic variant based on the copy number alterations (CNA) profile and tumor purity estimation. To disable, use `--disable_verdict` option. Please check out more technical details about Verdict [here](docs/verdict.md).