Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first commit with new neighborhood_connectivity function into _shap.py #66

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

LukasHats
Copy link

@LukasHats LukasHats commented Jan 6, 2025

As we discussed here @marcovarrone : So I did not yet fully understand how the functions are built in the package, but I thought it might make sense to put it into _shape.py.
The function can only be run after using gr.connected_components and takes the adata.obs components and calculates per image how many cells from a neighborhood are inside a connected component. The library_key (e.g. image_ID) is necessary as we have to calculate that per image. If users input a condition, it will plot the different conditions as hue, but that's not strictly necessary. Also users can set show=False to get the dataframe (here we can discuss if we want to return the figure object rather than the df), however the standard plotting function currently also gives the ax object and plots the graph. But happy to adjust all of that.

Its also currently set to violinplot, which makes sense if you have many images. But its probably a bit odd if users have only 1 or a few images

I don't know what else needs to be added so this will turn into a function like cc.pl.neighborhood_connectivity and if you want to implement a test for it.

@marcovarrone
Copy link
Collaborator

Thank you for the pull request @LukasHats!
The way it currently works is that for every shape metric, there is a function in cellcharter.tl, (e.g., cellcharter.tl.linarity, now renamed cellcharter.tl.linearity_metric) that computes the metric for every component and stores them as a key in a dictionary called shape_componentinsideadata.uns. The way it should be implemented is that you have a cellcharter.tl.nhood_connectivity_metricfunction that, similarly tolinearity_metric, curl_metric, etc..., computes the metric for every component and adds the relative key to the shape_component` dictionary.

After that, the user should use the cc.pl.shape_metrics function to plot the boxplots of the metric values. The function shape_metrics function was quite convoluted and developed specifically for the purpose of the paper. With the commits that I added to the pull requests I simplified it a bit, so now it should be more understandable.

In summary, what need to be done is:

  • By @LukasHats:

    • Transfer the metric computation from cc.pl.neighborhood_connectivity to cc.tl.nhood_connectivity_metric and store the result as in the other metric functions.
  • By me:

    • Add tests to the new version of theshape_metrics function and the nhood_connectivity_metric when complete
    • Update the CODEX tutorial

I hope everything is clear! If you have any doubts, feel free to let me know :)

@LukasHats
Copy link
Author

LukasHats commented Jan 13, 2025

Perfect, I will do it this week @marcovarrone . Thanks a lot for rewriting the whole shape metric part for it!

Also now I understand how the shape metric used to work! Really like the new approach!

@LukasHats
Copy link
Author

LukasHats commented Jan 16, 2025

@marcovarrone I am having trouble of integrating the nhood_connectivity score in the same way as the other metrics work. Lets take the example of purity:

adata.uns['shape_component']['purity']

{2476: 0.6571428571428571,
 2331: 0.5389221556886228,
 1366: 0.7424242424242424,

So for each connected component you have a quantification of the metric.

However, the neighborhood_connectivity idea is different from that. It's rather a measure of how many cells in an image are located inside a connected component or not. Or further extended: how many cells from a neighborhood are inside such a connected component or not, per image. So its rather a meta_score, not something for each component.

So I can not really deliver a metric per component, as its done for purity etc. The .uns would look different from how the .uns['shape_component'] works. I know this is a huge problem as the plotting function needs that format.
Maybe we could set the same score for every component, but would that be an idea? E.g. just as an example:

adata.uns['shape_component']['nhood_connectivity']

{2476: 0.3,
 2331: 0.3
 1366: 0.3,

Only if 2476, 2331, 1366 are in the same image of course. But I don't think this will give the results/score that I want toa achieve with the connectivity score. Otherwise we would maybe need to create a new uns and plotting function for the nhood_connectivity case

@marcovarrone
Copy link
Collaborator

That's a very good point @LukasHats, I didn't realize that!

I see two possible solutions:

  • We find a way to combine the purity and neighborhood connectivity metrics. In this way, even if the neighborhood connectivity for the components of the same domains is the same, its combination with purity will still lead to a value that is unique for every component. This can make sense since purity and neighborhood connectivity represent complementary views of the same thing. However, I don't know how to combine them in a sensible way rather than taking the mean between the two values.
  • We change the structure of adata.uns['shape_component'] and add a sort of key entity=domain or entity=component so that if it is a metric related to domains is plotted as a box plot, but if it's a metric related to components, it's showed as a single bar plot. It's not the most elegant solution and it may take some effort to implement the plotting part.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants