Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about patient split and generating the gene list (50 highly variable genes .json) #87

Open
chokevin8 opened this issue Jan 17, 2025 · 0 comments

Comments

@chokevin8
Copy link

chokevin8 commented Jan 17, 2025

Hello,

I am trying to use the HEST-Benchmark to benchmark my custom foundation models. I'm planning to use COAD and READ (for colorectal cancer) only, which according to the tutorial and Table A11 of the paper is a total of four patients, or eight samples (two patients and four samples each for COAD and READ). I plan to use a patient K fold cross validation like you guys did in the paper to avoid train-test data leakage, so this would be a 4-fold validation (total of 4 patients).

I've tried to match the COAD and READ sample names to the HEST_v1_1_0.csv metadata file, where the COAD Xenium samples are TENX111, TENX147, TENX148, TENX149 and READ Visium samples are ZEN36, ZEN40, ZEN48, ZEN49 according to the HEST-Benchmark jupyter notebook tutorial. Is it correct that TENX147, 148, and 149 are from "Patient 1" and TENX111 is from a different second patient for COAD? I wanted to double check this because unlike READ (which was straightforward when filtering, ZEN36 and 40 are from Patient 7 and 48 and 49 are from Patient 1), filtering for COAD in the .csv returns a lot of other additional samples (Xenium and Visium mixed) with missing patient info.

Secondly, for the gene list (var_50genes.json), I was able to use those above eight samples and generate it through

genes = get_k_genes( adata_list=adata_list, k=50, criteria='var', save_dir=save_path, min_cells_pct=0.10 )

this function, which would read all the COAD and READ scanpy anndata gene expressions. It seems like this is the right way to do it, but I'm still unsure.

I would appreciate it if you guys could help me out with this, thank you so much! @guillaumejaume @konst-int-i @pauldoucet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant