Question about patient split and generating the gene list (50 highly variable genes .json) #87

chokevin8 · 2025-01-17T00:30:07Z

Hello,

I am trying to use the HEST-Benchmark to benchmark my custom foundation models. I'm planning to use COAD and READ (for colorectal cancer) only, which according to the tutorial and Table A11 of the paper is a total of four patients, or eight samples (two patients and four samples each for COAD and READ). I plan to use a patient K fold cross validation like you guys did in the paper to avoid train-test data leakage, so this would be a 4-fold validation (total of 4 patients).

I've tried to match the COAD and READ sample names to the HEST_v1_1_0.csv metadata file, where the COAD Xenium samples are TENX111, TENX147, TENX148, TENX149 and READ Visium samples are ZEN36, ZEN40, ZEN48, ZEN49 according to the HEST-Benchmark jupyter notebook tutorial. Is it correct that TENX147, 148, and 149 are from "Patient 1" and TENX111 is from a different second patient for COAD? I wanted to double check this because unlike READ (which was straightforward when filtering, ZEN36 and 40 are from Patient 7 and 48 and 49 are from Patient 1), filtering for COAD in the .csv returns a lot of other additional samples (Xenium and Visium mixed) with missing patient info.

Secondly, for the gene list (var_50genes.json), I was able to use those above eight samples and generate it through

genes = get_k_genes( adata_list=adata_list, k=50, criteria='var', save_dir=save_path, min_cells_pct=0.10 )

this function, which would read all the COAD and READ scanpy anndata gene expressions. It seems like this is the right way to do it, but I'm still unsure.

I would appreciate it if you guys could help me out with this, thank you so much! @guillaumejaume @konst-int-i @pauldoucet

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about patient split and generating the gene list (50 highly variable genes .json) #87

Question about patient split and generating the gene list (50 highly variable genes .json) #87

chokevin8 commented Jan 17, 2025 •

edited

Loading

Question about patient split and generating the gene list (50 highly variable genes .json) #87

Question about patient split and generating the gene list (50 highly variable genes .json) #87

Comments

chokevin8 commented Jan 17, 2025 • edited Loading

chokevin8 commented Jan 17, 2025 •

edited

Loading