[ENH] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) #232

setoguchi-naoki · 2024-04-03T03:11:31Z

Reference Issues/PRs

Fexes #190 and #188

What does this implement/fix? Explain your changes.

Modfied QPD's methods following interface of vectorized QPD
Bug fix tests for QPD
replace scipy.misc.derivative to findiff because it will be removed in scypy 1.12.0

Does your contribution introduce a new dependency? If yes, which one?

yes, I have extras dependency
findiff: https://findiff.readthedocs.io/en/latest/

What should a reviewer concentrate their feedback on?

Please check if there are any problems with test

Did you add any tests for the change?

No

Any other comments?

So sorry for my late contribution

PR checklist

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
dependency isolation, see the estimator dependencies guide.

…ilures are resolved"

fkiraly

Thanks!

Before I go into this in more detail, could you explain why we are removing QPD_U?

setoguchi-naoki · 2024-04-04T00:23:41Z

Thanks!

Before I go into this in more detail, could you explain why we are removing QPD_U?

Because un-bounded mode (J_QPD_extended_U) is not vectorized in latest cyclic-boosting package. It is not difficult to set boundaries from user in many cases, so I guess it no problem but it is better to ask Felix the reason for more detail.

fkiraly · 2024-04-04T14:33:21Z

I see, thanks for the clarification.

Unfortunately, we can't just remove objects that have been previously added, or user code will break when we do our next release - imagine someone doing import QPD_U, and it's no longer there without prior notice.

So, instead, we could add a version bound cyclic_boosting<1.4.0 in the python_dependencies tag of QPD_U.

fkiraly

The test failures are due to a simple reason - after the refactor, what you get in _cdf etc is a np.ndarray (I hoped that would make things simpler).

You can get parameters broadcast to self.shape from self._bc_params["upper"], in each method, so you also do not need to make the params arguments.

…-naoki/skpro into revert-193-skip-cyclic

setoguchi-naoki · 2024-05-01T01:55:54Z

@fkiraly

The test failures are due to a simple reason - after the refactor, what you get in _cdf etc is a np.ndarray (I hoped that would make things simpler).

Yes, I noticed that later.

You can get parameters broadcast to self.shape from self._bc_params["upper"], in each method, so you also do not need to make the params arguments.

In this case, lower and upper parameters don't need to broadcast because these has same range with each samples. Hence, May I turn off broadcast_init?

The reason I set the lower and upper parameters for each method such as mean was because I needed to set the range when performing differential calculations for pdf using findiff. On the other hand, there was no need to specify it when estimating a probability distribution using QPDs. This specification may have been difficult to understand due to its seeming contradiction. Therefore, lower and upper are unified as parameters of the class.

fkiraly · 2024-05-01T13:55:33Z

In this case, lower and upper parameters don't need to broadcast because these has same range with each samples. Hence, May I turn off broadcast_init?

I see! Though, you stlil broadcast qv_low etc, no? So you broadcast some params while not others.

In this case, it might be best if you set the broadcast_params, tag, a list of str with the names of params you broadcast. You can access those then, internally, and do not need to broadcast manually.

setoguchi-naoki · 2024-05-07T09:50:23Z

I see! Though, you stlil broadcast qv_low etc, no? So you broadcast some params while not others.

Thank you for helpful information. In fact, these parameters are broadcasted in QPD class of cyclic boosting. So, I think that's operation does not need in skpro.

fkiraly · 2024-05-07T20:28:54Z

I see - in this case, I'd still set the tag, because that also ensures index and columns are handled, but you do not need to use the dict with the parameters.

fkiraly · 2024-05-10T20:05:10Z

should I help with the refactor? 2.3.0 will release soon, it would be nice if all distributions had moved over.

Also, minor unrelated query - would you be interested to present the cyclic boosting package or algorithm in one of the sktime meet-ups? Fridays, 1pm. We can chat on the discord: https://discord.com/invite/54ACzaFsn7

setoguchi-naoki · 2024-05-12T16:00:14Z

I'm very sorry for being late. It took a long time to debug. If the test passes successfully, I think the tasks to be carried out in this PR will be completed. A request was made to #320 for a modification related to changing the default parameters. Since I have no experience in implementing the deprivation process, it would be helpful if you could confirm that I have not omitted any necessary corrections.

Also, minor unrelated query - would you be interested to present the cyclic boosting package or algorithm in one of the sktime meet-ups? Fridays, 1pm. We can chat on the discord: https://discord.com/invite/54ACzaFsn7

Thank you for your suggestion. This is a good opportunity to increase the presence of Cyclic Boosting, but unfortunately due to schedule constraints, it may be difficult for me to make any presentations.

FelixWick · 2024-05-12T17:53:53Z

I can do a Cyclic Boosting presentation at your meetup.

fkiraly

Extellent, thanks.

Almost ready! I will wrap this up together with the deprecation instructions so it can go in the next release.

@FelixWick

This PR reworks the family of QPD family of distributions for efficiency and to allow removal of the newly introduced dependency `findiff` in #232. The dependency `findiff` was introduced for approximation of `pdf`, but in fact it is unnecessary as the `pdf` can be analytically derived by applying the chain rule. True, it has to be applied three or four times, but it's still the chain rule... efficiency and accuracy gains are significant, and it helps us avoid computing numerical derivatives for all entries in a large matrix, together with the now unnecessary `findiff` dependency. Makes the following changes: * refactoring of the three QPD distributions tp use `skpro` machinery: * use of the `skpro` native parameter broadcasting system instead of ad-hoc broadcasting * use of the `skpro` native approximation for `mean`, `var`, instead of three copies of similar (and partially duplicative) approximation inside the distributions * refactoring between the three QPD distributions with the end of simplification * refactoring QPD parameter computation into a single, fully vectorized function, `_prep_qpd_vars` * clean room reimplementation of `cdf`, `ppf` of the three distributions based on the `cyclic_boosting` reference * new implementation of `pdf`, as derivative of `cdf` As side effects of the rework: * all parameters now broadcast in numpy-like fashion, including `alpha`, `lower`, `upper`, which previously was not possible * the distributions can be 2D with more than 1 column, which previously was not possible * `version` (the base distribution) can now be an arbitrary continuous `scipy` distribution * `pdf` is numerically exact * the distributions do not have soft dependencies anymore Regarding the relation to `cyclic_boosting`: * this is clean room reimplementation and credit is given, so I hope this is fine license-wise - @FelixWick? * this is the result of trying to remove the `findiff` dependency for computing the `pdf` from the `cdf` that was introduced in #232, as well as cleanup before release. I ended up simplifying a lot, ending up here. In this sense, the work of @setoguchi-naoki was crucial in arriving at this point. * I would have no issue at all with you moving the improved code into `cyclic_boosting`. We can even restore the dependency and maintain the distribution logic in `cyclic_boosting` if that were your preference, e.g., for ownership reasons.

#232 - Add deprication warning - Docstring

fkiraly and others added 12 commits January 30, 2024 15:18

Revert "[MNT] skip CyclicBoosting and QPD tests until sktime#189 fa…

90357b9

…ilures are resolved"

Merge branch 'main' into revert-193-skip-cyclic

303bf2c

Update test_all_regressors.py

17b3e65

Merge branch 'main' into revert-193-skip-cyclic

d5d0c0e

Merge branch 'main' into revert-193-skip-cyclic

608354b

update for vectorized QPD and bug fix for qpd test

c564906

remove unnessesary data type

bf59ee8

update python dependency

cbeb877

minor change

087748b

move findiff into function

e2459c7

minor change

00fef17

remove QPD_U

dc2e81b

fkiraly requested changes Apr 3, 2024

View reviewed changes

fkiraly added bug module:probability&simulation probability distributions and simulators labels Apr 4, 2024

fkiraly mentioned this pull request Apr 4, 2024

[ENH] Consolidate quantile parameterized distributions in few classes #235

Closed

fkiraly added 10 commits April 18, 2024 10:50

move base to folder

d0dd460

delegate class

b60afc4

start work

24ef82f

corrected name

6ea8e1c

Merge branch 'delegate-distr' into qpd-delegate

7d36bec

docstr

669fbe4

complete

b84c6f7

Update _delegate.py

744f273

Update _delegate.py

c4247bf

Merge branch 'delegate-distr' into qpd-delegate

d6e7749

fkiraly mentioned this pull request Apr 18, 2024

[ENH] simplified interface to Johnson QPD #254

Closed

linting

99772f8

fkiraly added 2 commits April 30, 2024 12:57

Merge branch 'main' into pr/232

e464536

Update qpd.py

65272bc

fkiraly requested changes Apr 30, 2024

View reviewed changes

setoguchi-naoki added 5 commits May 1, 2024 09:47

new API for distributions

ab2224b

Merge branch 'revert-193-skip-cyclic' of https://github.com/setoguchi…

70616e4

…-naoki/skpro into revert-193-skip-cyclic

remove unnecessary code

38c4ddd

mod tags

0c45301

may be included in future versions of cyclic-boosting

cad61ae

fix bug

9c859a5

resolve shape mismatch

333b69a

setoguchi-naoki added 4 commits May 12, 2024 23:12

mod input and resolve shape

8c599a5

formatting

265ac34

miss commit

823a32d

minor change

5773f27

setoguchi-naoki mentioned this pull request May 12, 2024

[MNT] Deprecation message for CyclicBoosting changes #320

Merged

5 tasks

fkiraly changed the title ~~[MNT][BUG] update Johnson QPDistributions wit bugfixes and vectorization (cyclic-boosting ver.1.4.0)~~ [MNT][BUG] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) May 13, 2024

docstring

e91ea5d

fkiraly approved these changes May 14, 2024

View reviewed changes

fkiraly changed the title ~~[MNT][BUG] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0)~~ [ENH] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) May 14, 2024

fkiraly merged commit 3dae189 into sktime:main May 14, 2024
36 checks passed

fkiraly mentioned this pull request May 15, 2024

[ENH] native implementation of Johnson QPD family, explicit pdf #327

Merged

fkiraly pushed a commit that referenced this pull request May 28, 2024

[MNT] Deprecation message for CyclicBoosting changes (#320)

5198603

#232 - Add deprication warning - Docstring

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) #232

[ENH] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) #232

setoguchi-naoki commented Apr 3, 2024

fkiraly left a comment

setoguchi-naoki commented Apr 4, 2024

fkiraly commented Apr 4, 2024 •

edited

Loading

fkiraly left a comment •

edited

Loading

setoguchi-naoki commented May 1, 2024

fkiraly commented May 1, 2024

setoguchi-naoki commented May 7, 2024

fkiraly commented May 7, 2024 •

edited

Loading

fkiraly commented May 10, 2024

setoguchi-naoki commented May 12, 2024

FelixWick commented May 12, 2024

fkiraly left a comment

[ENH] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) #232

[ENH] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) #232

Conversation

setoguchi-naoki commented Apr 3, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

For all contributions

For new estimators

fkiraly left a comment

Choose a reason for hiding this comment

setoguchi-naoki commented Apr 4, 2024

fkiraly commented Apr 4, 2024 • edited Loading

fkiraly left a comment • edited Loading

Choose a reason for hiding this comment

setoguchi-naoki commented May 1, 2024

fkiraly commented May 1, 2024

setoguchi-naoki commented May 7, 2024

fkiraly commented May 7, 2024 • edited Loading

fkiraly commented May 10, 2024

setoguchi-naoki commented May 12, 2024

FelixWick commented May 12, 2024

fkiraly left a comment

Choose a reason for hiding this comment

fkiraly commented Apr 4, 2024 •

edited

Loading

fkiraly left a comment •

edited

Loading

fkiraly commented May 7, 2024 •

edited

Loading