-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) #232
Conversation
…ilures are resolved"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Before I go into this in more detail, could you explain why we are removing QPD_U
?
Because un-bounded mode (J_QPD_extended_U) is not vectorized in latest cyclic-boosting package. It is not difficult to set boundaries from user in many cases, so I guess it no problem but it is better to ask Felix the reason for more detail. |
I see, thanks for the clarification. Unfortunately, we can't just remove objects that have been previously added, or user code will break when we do our next release - imagine someone doing So, instead, we could add a version bound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test failures are due to a simple reason - after the refactor, what you get in _cdf
etc is a np.ndarray
(I hoped that would make things simpler).
You can get parameters broadcast to self.shape from self._bc_params["upper"]
, in each method, so you also do not need to make the params arguments.
Yes, I noticed that later.
In this case, lower and upper parameters don't need to broadcast because these has same range with each samples. Hence, May I turn off broadcast_init? The reason I set the lower and upper parameters for each method such as mean was because I needed to set the range when performing differential calculations for pdf using findiff. On the other hand, there was no need to specify it when estimating a probability distribution using QPDs. This specification may have been difficult to understand due to its seeming contradiction. Therefore, lower and upper are unified as parameters of the class. |
I see! Though, you stlil broadcast In this case, it might be best if you set the |
Thank you for helpful information. In fact, these parameters are broadcasted in QPD class of cyclic boosting. So, I think that's operation does not need in skpro. |
I see - in this case, I'd still set the tag, because that also ensures |
should I help with the refactor? 2.3.0 will release soon, it would be nice if all distributions had moved over. Also, minor unrelated query - would you be interested to present the cyclic boosting package or algorithm in one of the |
I'm very sorry for being late. It took a long time to debug. If the test passes successfully, I think the tasks to be carried out in this PR will be completed. A request was made to #320 for a modification related to changing the default parameters. Since I have no experience in implementing the deprivation process, it would be helpful if you could confirm that I have not omitted any necessary corrections.
Thank you for your suggestion. This is a good opportunity to increase the presence of Cyclic Boosting, but unfortunately due to schedule constraints, it may be difficult for me to make any presentations. |
I can do a Cyclic Boosting presentation at your meetup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extellent, thanks.
Almost ready! I will wrap this up together with the deprecation instructions so it can go in the next release.
This PR reworks the family of QPD family of distributions for efficiency and to allow removal of the newly introduced dependency `findiff` in #232. The dependency `findiff` was introduced for approximation of `pdf`, but in fact it is unnecessary as the `pdf` can be analytically derived by applying the chain rule. True, it has to be applied three or four times, but it's still the chain rule... efficiency and accuracy gains are significant, and it helps us avoid computing numerical derivatives for all entries in a large matrix, together with the now unnecessary `findiff` dependency. Makes the following changes: * refactoring of the three QPD distributions tp use `skpro` machinery: * use of the `skpro` native parameter broadcasting system instead of ad-hoc broadcasting * use of the `skpro` native approximation for `mean`, `var`, instead of three copies of similar (and partially duplicative) approximation inside the distributions * refactoring between the three QPD distributions with the end of simplification * refactoring QPD parameter computation into a single, fully vectorized function, `_prep_qpd_vars` * clean room reimplementation of `cdf`, `ppf` of the three distributions based on the `cyclic_boosting` reference * new implementation of `pdf`, as derivative of `cdf` As side effects of the rework: * all parameters now broadcast in numpy-like fashion, including `alpha`, `lower`, `upper`, which previously was not possible * the distributions can be 2D with more than 1 column, which previously was not possible * `version` (the base distribution) can now be an arbitrary continuous `scipy` distribution * `pdf` is numerically exact * the distributions do not have soft dependencies anymore Regarding the relation to `cyclic_boosting`: * this is clean room reimplementation and credit is given, so I hope this is fine license-wise - @FelixWick? * this is the result of trying to remove the `findiff` dependency for computing the `pdf` from the `cdf` that was introduced in #232, as well as cleanup before release. I ended up simplifying a lot, ending up here. In this sense, the work of @setoguchi-naoki was crucial in arriving at this point. * I would have no issue at all with you moving the improved code into `cyclic_boosting`. We can even restore the dependency and maintain the distribution logic in `cyclic_boosting` if that were your preference, e.g., for ownership reasons.
#232 - Add deprication warning - Docstring
Reference Issues/PRs
Fexes #190 and #188
What does this implement/fix? Explain your changes.
Does your contribution introduce a new dependency? If yes, which one?
yes, I have extras dependency
findiff: https://findiff.readthedocs.io/en/latest/
What should a reviewer concentrate their feedback on?
Please check if there are any problems with test
Did you add any tests for the change?
No
Any other comments?
So sorry for my late contribution
PR checklist
For all contributions
How to: add yourself to the all-contributors file in the
skpro
root directory (not theCONTRIBUTORS.md
). Common badges:code
- fixing a bug, or adding code logic.doc
- writing or improving documentation or docstrings.bug
- reporting or diagnosing a bug (get this pluscode
if you also fixed the bug in the PR).maintenance
- CI, test framework, release.See here for full badge reference
For new estimators
docs/source/api_reference/taskname.rst
, follow the pattern.Examples
section.python_dependencies
tag and ensureddependency isolation, see the estimator dependencies guide.