You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use the HEST-Benchmark to benchmark my custom foundation models. I'm planning to use COAD and READ (for colorectal cancer) only, which according to the tutorial and Table A11 of the paper is a total of four patients, or eight samples (two patients and four samples each for COAD and READ). I plan to use a patient K fold cross validation like you guys did in the paper to avoid train-test data leakage, so this would be a 4-fold validation (total of 4 patients).
I've tried to match the COAD and READ sample names to the HEST_v1_1_0.csv metadata file, where the COAD Xenium samples are TENX111, TENX147, TENX148, TENX149 and READ Visium samples are ZEN36, ZEN40, ZEN48, ZEN49 according to the HEST-Benchmark jupyter notebook tutorial. Is it correct that TENX147, 148, and 149 are from "Patient 1" and TENX111 is from a different second patient for COAD? I wanted to double check this because unlike READ (which was straightforward when filtering, ZEN36 and 40 are from Patient 7 and 48 and 49 are from Patient 1), filtering for COAD in the .csv returns a lot of other additional samples (Xenium and Visium mixed) with missing patient info.
Secondly, for the gene list (var_50genes.json), I was able to use those above eight samples and generate it through
this function, which would read all the COAD and READ scanpy anndata gene expressions. It seems like this is the right way to do it, but I'm still unsure.
Hello,
I am trying to use the HEST-Benchmark to benchmark my custom foundation models. I'm planning to use COAD and READ (for colorectal cancer) only, which according to the tutorial and Table A11 of the paper is a total of four patients, or eight samples (two patients and four samples each for COAD and READ). I plan to use a patient K fold cross validation like you guys did in the paper to avoid train-test data leakage, so this would be a 4-fold validation (total of 4 patients).
I've tried to match the COAD and READ sample names to the HEST_v1_1_0.csv metadata file, where the COAD Xenium samples are TENX111, TENX147, TENX148, TENX149 and READ Visium samples are ZEN36, ZEN40, ZEN48, ZEN49 according to the HEST-Benchmark jupyter notebook tutorial. Is it correct that TENX147, 148, and 149 are from "Patient 1" and TENX111 is from a different second patient for COAD? I wanted to double check this because unlike READ (which was straightforward when filtering, ZEN36 and 40 are from Patient 7 and 48 and 49 are from Patient 1), filtering for COAD in the .csv returns a lot of other additional samples (Xenium and Visium mixed) with missing patient info.
Secondly, for the gene list (var_50genes.json), I was able to use those above eight samples and generate it through
genes = get_k_genes( adata_list=adata_list, k=50, criteria='var', save_dir=save_path, min_cells_pct=0.10 )
this function, which would read all the COAD and READ scanpy anndata gene expressions. It seems like this is the right way to do it, but I'm still unsure.
I would appreciate it if you guys could help me out with this, thank you so much! @guillaumejaume @konst-int-i @pauldoucet
The text was updated successfully, but these errors were encountered: