Skip to content

Commit

Permalink
Merge pull request #62 from broadinstitute/AnneCarpenter-patch-1
Browse files Browse the repository at this point in the history
Adding link to zenodo resources, updating definitions
  • Loading branch information
afermg authored Dec 16, 2024
2 parents b44a49f + b2b931b commit 83a172a
Showing 1 changed file with 19 additions and 4 deletions.
23 changes: 19 additions & 4 deletions explanations/data_description.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,10 +91,25 @@ From 12 sources (data-generating centers):



4. **Assembled Subsets**

- Combined datasets in parquet format
- URLs available in [profile index](https://github.com/jump-cellpainting/datasets/blob/main/manifests/profile_index.csv)
4. **Index **
You can find the profile : https://github.com/jump-cellpainting/datasets/blob/main/manifests/profile_index.csv

- Parquet tables in which profiles were preprocessed with varying optimized pipelines.
- The "Interpretable" tables means that they are processed to the point where features retain their original mapping from the original features' names (relating to size, shape, intensity, etc.).



6. **Processed JUMP reference tables (JUMP_rr tables)**
[This](https://zenodo.org/records/14046034) dataset provides multiple precomputed analysis tables to make JUMP data exploration accessible:

- 'X_features.parquet' contains a ranking of the features that distinguish a given perturbation from negative controls.
- 'X_gallery.parquet' is for visualization of the images with all channels collapsed into one.
- 'X_cosinesim...parquet' contains the pairwise cosine similarity of all perturbations within a given dataset (i.e., orf, crispr). This allows searching for the closest matches for each perturbation of interest or looking at all relationships in a heatmap.
- 'X...significance...parquet' is the statistical significance for the phenotypic activity of a given sample (see broad.io/crispr_feature for a formal definition). It shows which perturbations yielded a phenotype distinguishable from negative controls.
- 'full' tables contain all the data points from the resulting analysis. Their non-full counterpart contains a subset comprised of the most significant entries, meant for in-browser consumption and queries.
- Many of the above tables can be interactively viewed using [JUMPrr tools](https://github.com/broadinstitute/monorepo/tree/main/libs/jump_rr#quick-data-access)



## Data Access

Expand Down

0 comments on commit 83a172a

Please sign in to comment.