You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that the reported likelihood is calculated using a subset of the cells, while the loss function (which is minimized while fitting the parameters) is calculated using a superset of the cells used for the likelihood.
The loss is defined here and is calculated by default with weighted = True, which leads to a superset of the cells being used compared to the likelihood.
This seems problematic, as the parameters fitting could find parameters with a good loss but a bad likelihood compared to some other parameters. The likelihood function does not fully reflect what is being optimized.
I manged to illustrate that this actually happens in practice on a small subset of the pancreas data:
Hi,
I noticed that the reported likelihood is calculated using a subset of the cells, while the loss function (which is minimized while fitting the parameters) is calculated using a superset of the cells used for the likelihood.
The likelihood is defined here and is calculated by default with weighted = "upper".
The loss is defined here and is calculated by default with weighted = True, which leads to a superset of the cells being used compared to the likelihood.
This seems problematic, as the parameters fitting could find parameters with a good loss but a bad likelihood compared to some other parameters. The likelihood function does not fully reflect what is being optimized.
I manged to illustrate that this actually happens in practice on a small subset of the pancreas data:
Running the above code prints the following:
The two different runs find two sets of parameters where for one the likelihood is better, but the loss is worse.
What is the reason why the likelihood is not calculated on the same cells as the loss?
/Paula
The text was updated successfully, but these errors were encountered: