Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent Renaming Behavior for as.mo() Function #178

Open
ConnorChato opened this issue Jan 6, 2025 · 1 comment
Open

Silent Renaming Behavior for as.mo() Function #178

ConnorChato opened this issue Jan 6, 2025 · 1 comment

Comments

@ConnorChato
Copy link

I have a workflow that includes some automated match checking, so I'm running as.mo(keep_synonyms = TRUE) to really simplify that.

With a fairly full name, it works perfectly

> as.mo("Rhizobium radiobacter")
ℹ The following microorganism was taxonomically renamed (use keep_synonyms = TRUE to leave uncorrected):Rhizobium radiobacter (Young et al., 2001)  ->  Agrobacterium radiobacter (Hordt et al., 2020)
Class 'mo'
[1] B_AGRBCT_RDBC

> as.mo("Rhizobium radiobacter", keep_synonyms = TRUE)
Class 'mo'
[1] B_RHZBM_RDBC

With a condensed name, it doesn't warn about any renaming or uncertainty...

> as.mo("RHIRAD")
Class 'mo'
[1] B_AGRBCT_RDBC

> as.mo("RHIRAD", keep_synonyms = TRUE)
Class 'mo'
[1] B_AGRBCT_RDBC

This felt pretty suspicious, since both have the same (low) matching score and I was suspecting that I'd get at least an uncertainty warning. I'm actually surprised they both have the same score at all since the levinshtein distance is lower between 'RHIRAD' and 'Rhizobium radiobacter'.

> mo_matching_score('RHIRAD', 'Rhizobium radiobacter')
[1] 0.3333333
> mo_matching_score('RHIRAD', 'Agrobacterium radiobacter')
[1] 0.3333333

Would you be able to verify whether or not there is anything overriding the normal matching behavior here? Or if I am just missing an expected behavior.

Here's my sessioninfo() output in case it's useful

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS/LAPACK: /opt/OpenBLAS/lib/libopenblasp-r0.3.13.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C            LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C         LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lubridate_1.9.3   fuzzyjoin_0.1.6   readxl_1.4.3      AMR_2.1.1.9122    dplyr_1.1.4       data.table_1.15.0 here_1.0.1       

loaded via a namespace (and not attached):
 [1] compiler_4.3.2    crayon_1.5.2      tidyselect_1.2.0  callr_3.7.3       readr_2.1.5       R6_2.5.1          generics_0.1.3   
 [8] curl_5.2.0        tibble_3.2.1      SparkR_3.5.0      desc_1.4.3        rprojroot_2.0.4   pillar_1.9.0      tzdb_0.4.0       
[15] rlang_1.1.3       utf8_1.2.4        pkgload_1.3.4     timechange_0.3.0  cli_3.6.2         withr_3.0.0       magrittr_2.0.3   
[22] ps_1.7.6          processx_3.8.3    rstudioapi_0.15.0 remotes_2.4.2.1   hms_1.1.3         lifecycle_1.0.4   vctrs_0.6.5      
[29] glue_1.7.0        cellranger_1.1.0  pkgbuild_1.4.3    fansi_1.0.6       tools_4.3.2       pkgconfig_2.0.3  

I also expanded the reference database a little with add_custom_microorganisms(), but nothing that would be close to these taxa.

Thanks,
Connor

@msberends
Copy link
Owner

Many thanks for using our package, and for this wonderful notice!

Not sure if this should be intended behaviour, I’ll dive into it and will get back to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants