[ENH] Implement Histogram Conditional Density Estimation #322

ShreeshaM07 · 2024-05-13T15:26:12Z

Describe the solution you'd like

Histogram estimation is not present in skpro. Implement them from scratch using the conditional density estimate finding the optimal binwidth(h) and find the function that fits the histograms most aptly without over smoothing nor undersmoothing.

Additional context

Useful resources

Wasserman - All of statistics(chapter 20.2)
https://www.youtube.com/watch?v=SUvPJ4URYGA

ShreeshaM07 · 2024-05-13T15:26:43Z

@fkiraly Do you recommend any other resources to refer for implementing this?

Also do I have to implement the Kernel Density Estimation for Gaussian, tophat for this histogram?

fkiraly · 2024-05-13T22:29:01Z

Sure!

Some classical ones:

Hyndman et al - Estimating and visualizing conditional densities
Efromovich - Conditional density estimation in a regression setting
Bashtannyk, Hyndman - Bandwidth selection for kernel conditional density estimation

This is mostly kernel based.

Also, what is tophat?

ShreeshaM07 · 2024-05-14T07:03:39Z

Also, what is tophat?

Its a type of kernel in sklearn KDE
the K(x,h) is proportional to 1 for x<h.

It mostly resembles a bin itself.

fkiraly · 2024-05-14T08:24:46Z

Oh, I see, the "top-hat kernel", which is the same as a box kernel.
It corresponds to a uniform distribution.

Here are a few things I noticed:

for histogram CDE, we need a histogram distribution - these are not available in skpro yet. That might be a simpler implementation item, as it does not hinge on design decisions? Issue here: [ENH] histogram distribution #323
there is also a subtask of representing kernel mixtures, not all of these are distribution mixtures. There are a few design decisions to think about, this is a more difficult issue. Issue opened here: [ENH] kernel mixture distribution #324
Given that CDE literature is a bit dense and not entirely straightforward to get a grip on - is it perhaps easier to start with unconditional density estimation, e.g., looking at the sklearn and similar estimators for DE and the "right interfaces"? That would be a bit of a pivot, but might help arriving at the right design and implementation choices step-by-step.

ShreeshaM07 added the feature request New feature or request label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Implement Histogram Conditional Density Estimation #322

[ENH] Implement Histogram Conditional Density Estimation #322

ShreeshaM07 commented May 13, 2024 •

edited by fkiraly

Loading

ShreeshaM07 commented May 13, 2024 •

edited

Loading

fkiraly commented May 13, 2024

ShreeshaM07 commented May 14, 2024

fkiraly commented May 14, 2024

[ENH] Implement Histogram Conditional Density Estimation #322

[ENH] Implement Histogram Conditional Density Estimation #322

Comments

ShreeshaM07 commented May 13, 2024 • edited by fkiraly Loading

ShreeshaM07 commented May 13, 2024 • edited Loading

fkiraly commented May 13, 2024

ShreeshaM07 commented May 14, 2024

fkiraly commented May 14, 2024

ShreeshaM07 commented May 13, 2024 •

edited by fkiraly

Loading

ShreeshaM07 commented May 13, 2024 •

edited

Loading