Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Implement Histogram Conditional Density Estimation #322

Open
ShreeshaM07 opened this issue May 13, 2024 · 4 comments
Open

[ENH] Implement Histogram Conditional Density Estimation #322

ShreeshaM07 opened this issue May 13, 2024 · 4 comments
Labels
feature request New feature or request

Comments

@ShreeshaM07
Copy link
Contributor

ShreeshaM07 commented May 13, 2024

Describe the solution you'd like

Histogram estimation is not present in skpro. Implement them from scratch using the conditional density estimate finding the optimal binwidth(h) and find the function that fits the histograms most aptly without over smoothing nor undersmoothing.

Additional context

Useful resources

@ShreeshaM07 ShreeshaM07 added the feature request New feature or request label May 13, 2024
@ShreeshaM07
Copy link
Contributor Author

ShreeshaM07 commented May 13, 2024

@fkiraly Do you recommend any other resources to refer for implementing this?

Also do I have to implement the Kernel Density Estimation for Gaussian, tophat for this histogram?

@fkiraly
Copy link
Collaborator

fkiraly commented May 13, 2024

Sure!

Some classical ones:

This is mostly kernel based.

Also, what is tophat?

@ShreeshaM07
Copy link
Contributor Author

Also, what is tophat?

Its a type of kernel in sklearn KDE
the K(x,h) is proportional to 1 for x<h.

It mostly resembles a bin itself.

@fkiraly
Copy link
Collaborator

fkiraly commented May 14, 2024

Oh, I see, the "top-hat kernel", which is the same as a box kernel.
It corresponds to a uniform distribution.

Here are a few things I noticed:

  • for histogram CDE, we need a histogram distribution - these are not available in skpro yet. That might be a simpler implementation item, as it does not hinge on design decisions? Issue here: [ENH] histogram distribution #323
  • there is also a subtask of representing kernel mixtures, not all of these are distribution mixtures. There are a few design decisions to think about, this is a more difficult issue. Issue opened here: [ENH] kernel mixture distribution #324
  • Given that CDE literature is a bit dense and not entirely straightforward to get a grip on - is it perhaps easier to start with unconditional density estimation, e.g., looking at the sklearn and similar estimators for DE and the "right interfaces"? That would be a bit of a pivot, but might help arriving at the right design and implementation choices step-by-step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants