Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the stability of the knee point identification. #117

Merged
merged 6 commits into from
Dec 7, 2024
Merged

Conversation

LTLA
Copy link
Collaborator

@LTLA LTLA commented Dec 7, 2024

Closes #115.

LTLA added 3 commits December 4, 2024 10:27
The new algorithm is based on maximizing the distance from a line
between the plateau and the inflection point, which avoids problems with
the instability of the empirical second derivative, even with smoothing.
@LTLA
Copy link
Collaborator Author

LTLA commented Dec 7, 2024

Looks good for the suite of test datasets in DropletTestFiles.

library(DropletTestFiles)
X <- listTestFiles()
keep <- grep("Raw HDF5", X$description)

png("results.png", width=6, height=10, units="in", res=120)
library(DropletUtils)
par(mfrow=c(3,2))
for (i in keep) {
    path <- getTestFile(X$rdatapath[i], prefix=FALSE)
    se <- read10xCounts(path, type="HDF5")
    current <- barcodeRanks(assay(se))
    o <- order(current$rank)
    plot(current$rank[o], current$total[o], log="xy", type="l", main=X$file.dataset[i])
    abline(h=metadata(current)$knee, lty=2)
    abline(h=metadata(current)$inflection, lty=3)
}
dev.off()

results

Session information
R Under development (unstable) (2024-10-30 r87277)
Platform: aarch64-apple-darwin22.6.0
Running under: macOS Ventura 13.7

Matrix products: default
BLAS:   /Users/luna/Software/R/trunk/lib/libRblas.dylib
LAPACK: /Users/luna/Software/R/trunk/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] DropletUtils_1.25.2         SingleCellExperiment_1.29.1
 [3] SummarizedExperiment_1.37.0 Biobase_2.67.0
 [5] GenomicRanges_1.59.1        GenomeInfoDb_1.43.2
 [7] IRanges_2.41.1              S4Vectors_0.45.2
 [9] BiocGenerics_0.53.3         generics_0.1.3
[11] MatrixGenerics_1.19.0       matrixStats_1.4.1
[13] DropletTestFiles_1.17.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1          dplyr_1.1.4
 [3] blob_1.2.4                filelock_1.0.3
 [5] R.utils_2.12.3            Biostrings_2.75.1
 [7] fastmap_1.2.0             BiocFileCache_2.15.0
 [9] mime_0.12                 lifecycle_1.0.4
[11] statmod_1.5.0             KEGGREST_1.47.0
[13] RSQLite_2.3.9             magrittr_2.0.3
[15] compiler_4.5.0            rlang_1.1.4
[17] tools_4.5.0               utf8_1.2.4
[19] yaml_2.3.10               S4Arrays_1.7.1
[21] dqrng_0.4.1               bit_4.5.0.1
[23] curl_6.0.1                DelayedArray_0.33.3
[25] abind_1.4-8               BiocParallel_1.41.0
[27] HDF5Array_1.35.2          withr_3.0.2
[29] purrr_1.0.2               R.oo_1.27.0
[31] grid_4.5.0                fansi_1.0.6
[33] ExperimentHub_2.15.0      beachmat_2.23.3
[35] Rhdf5lib_1.29.0           edgeR_4.5.1
[37] cli_3.6.3                 crayon_1.5.3
[39] httr_1.4.7                DelayedMatrixStats_1.29.0
[41] scuttle_1.17.0            DBI_1.2.3
[43] cachem_1.1.0              rhdf5_2.51.0
[45] zlibbioc_1.53.0           parallel_4.5.0
[47] AnnotationDbi_1.69.0      BiocManager_1.30.25
[49] XVector_0.47.0            vctrs_0.6.5
[51] Matrix_1.7-1              jsonlite_1.8.9
[53] bit64_4.5.2               locfit_1.5-9.10
[55] limma_3.63.2              glue_1.8.0
[57] codetools_0.2-20          BiocVersion_3.21.1
[59] UCSC.utils_1.3.0          tibble_3.2.1
[61] pillar_1.9.0              rappdirs_0.3.3
[63] rhdf5filters_1.19.0       GenomeInfoDbData_1.2.13
[65] R6_2.5.1                  dbplyr_2.5.0
[67] sparseMatrixStats_1.19.0  lattice_0.22-6
[69] AnnotationHub_3.15.0      png_0.1-8
[71] R.methodsS3_1.8.2         memoise_2.0.1
[73] Rcpp_1.0.13-1             SparseArray_1.7.2
[75] pkgconfig_2.0.3

@LTLA LTLA merged commit 407cf2a into devel Dec 7, 2024
1 check passed
@LTLA LTLA deleted the improved-knee branch December 7, 2024 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10X vs MGI single cell empty droplets identification
1 participant