Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the the order of magnitude algorithm from Cell Ranger to emptyDropsCellRanger #119

Open
wants to merge 3 commits into
base: devel
Choose a base branch
from

Conversation

an-altosian
Copy link

Hi @LTLA,

Happy New Year!! Here is Dongze using my Altos's GitHub account. Finally I got some cycles to address #88 (and #109 as a bonus ;P).

According to CellRanger's GitHub repo, I implemented the core functions for the order of magnitude algorithm to estimate the n.expected.cells parameter if it is set as NULL (the default). I have tested that the R implementation resulted in the same number as the python implementation in CellRanger if ignoring the effects of random seeds.

The only concern I have is that the current implementation uses population variance in this line (or here in Cell Ranger). Please feel free to change this to sample var (var(top_n_boot)) if you think it is more appropriate.

Best,
Dongze

@LTLA
Copy link
Collaborator

LTLA commented Jan 3, 2025

After reading this for half an hour, I have no idea what this is doing. Why is there a need for bootstrapping? Why can't the threshold be found exactly?

@an-altosian
Copy link
Author

an-altosian commented Jan 4, 2025

I guess the bootstrapping is saying those barcodes represent a tiny fraction of the population, so they want to use bootstrapping to get a "robust" threshold that is a "guaranteed approximation"?

Haha, I guess you found the reason why I did not copy and paste the function you wrote in #119 but spent a day translating their python functions into R. 😉

I have tested that the numbers I got using the function you wrote were only tens of cells' different than their implementation. It is totally fine if we just apply your succinct and straightforward function, but I guess there will be issues popping up requesting "Cell Ranger"'s algorithms.

@LTLA
Copy link
Collaborator

LTLA commented Jan 6, 2025

Hm. I'm caught between two concerns:

  1. Reproducing the CellRanger calls is desirable.
  2. But I don't want to maintain that code or keep it up to date with new CellRanger versions.

So how about this. We'll spin off the existing existing emptyDropsCellRanger function in a new Bioconductor package that you (and possibly other interested parties) can maintain. Then I don't have to worry about staying up to date with CellRanger, and you can achieve perfect parity at your own discretion.

Depending on the scope of this new package, I am also willing to offer the other 10X-specific functions, e.g., get10xMolInfoStats, read10xMolInfo. These functions also cause maintenance headaches for me whenever 10X decides to change the CellRanger format, so if someone is actively interested in staying aligned with CellRanger, they would be better suited to as maintainers. (In fact, I do have https://github.com/LTLA/crio, which was my plan for isolating these headaches in a separate package, but I ran out of motivation.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants