Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) #232

Merged
merged 68 commits into from
May 14, 2024

Conversation

setoguchi-naoki
Copy link
Contributor

Reference Issues/PRs

Fexes #190 and #188

What does this implement/fix? Explain your changes.

  • Modfied QPD's methods following interface of vectorized QPD
  • Bug fix tests for QPD
  • replace scipy.misc.derivative to findiff because it will be removed in scypy 1.12.0

Does your contribution introduce a new dependency? If yes, which one?

yes, I have extras dependency
findiff: https://findiff.readthedocs.io/en/latest/

What should a reviewer concentrate their feedback on?

Please check if there are any problems with test

Did you add any tests for the change?

No

Any other comments?

So sorry for my late contribution

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Before I go into this in more detail, could you explain why we are removing QPD_U?

@setoguchi-naoki
Copy link
Contributor Author

Thanks!

Before I go into this in more detail, could you explain why we are removing QPD_U?

Because un-bounded mode (J_QPD_extended_U) is not vectorized in latest cyclic-boosting package. It is not difficult to set boundaries from user in many cases, so I guess it no problem but it is better to ask Felix the reason for more detail.

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 4, 2024

I see, thanks for the clarification.

Unfortunately, we can't just remove objects that have been previously added, or user code will break when we do our next release - imagine someone doing import QPD_U, and it's no longer there without prior notice.

So, instead, we could add a version bound cyclic_boosting<1.4.0 in the python_dependencies tag of QPD_U.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failures are due to a simple reason - after the refactor, what you get in _cdf etc is a np.ndarray (I hoped that would make things simpler).

You can get parameters broadcast to self.shape from self._bc_params["upper"], in each method, so you also do not need to make the params arguments.

@setoguchi-naoki
Copy link
Contributor Author

@fkiraly

The test failures are due to a simple reason - after the refactor, what you get in _cdf etc is a np.ndarray (I hoped that would make things simpler).

Yes, I noticed that later.

You can get parameters broadcast to self.shape from self._bc_params["upper"], in each method, so you also do not need to make the params arguments.

In this case, lower and upper parameters don't need to broadcast because these has same range with each samples. Hence, May I turn off broadcast_init?

The reason I set the lower and upper parameters for each method such as mean was because I needed to set the range when performing differential calculations for pdf using findiff. On the other hand, there was no need to specify it when estimating a probability distribution using QPDs. This specification may have been difficult to understand due to its seeming contradiction. Therefore, lower and upper are unified as parameters of the class.

@fkiraly
Copy link
Collaborator

fkiraly commented May 1, 2024

In this case, lower and upper parameters don't need to broadcast because these has same range with each samples. Hence, May I turn off broadcast_init?

I see! Though, you stlil broadcast qv_low etc, no? So you broadcast some params while not others.

In this case, it might be best if you set the broadcast_params, tag, a list of str with the names of params you broadcast. You can access those then, internally, and do not need to broadcast manually.

@setoguchi-naoki
Copy link
Contributor Author

I see! Though, you stlil broadcast qv_low etc, no? So you broadcast some params while not others.

Thank you for helpful information. In fact, these parameters are broadcasted in QPD class of cyclic boosting. So, I think that's operation does not need in skpro.

@fkiraly
Copy link
Collaborator

fkiraly commented May 7, 2024

I see - in this case, I'd still set the tag, because that also ensures index and columns are handled, but you do not need to use the dict with the parameters.

@fkiraly
Copy link
Collaborator

fkiraly commented May 10, 2024

should I help with the refactor? 2.3.0 will release soon, it would be nice if all distributions had moved over.

Also, minor unrelated query - would you be interested to present the cyclic boosting package or algorithm in one of the sktime meet-ups? Fridays, 1pm. We can chat on the discord: https://discord.com/invite/54ACzaFsn7

@setoguchi-naoki
Copy link
Contributor Author

I'm very sorry for being late. It took a long time to debug. If the test passes successfully, I think the tasks to be carried out in this PR will be completed. A request was made to #320 for a modification related to changing the default parameters. Since I have no experience in implementing the deprivation process, it would be helpful if you could confirm that I have not omitted any necessary corrections.

Also, minor unrelated query - would you be interested to present the cyclic boosting package or algorithm in one of the sktime meet-ups? Fridays, 1pm. We can chat on the discord: https://discord.com/invite/54ACzaFsn7

Thank you for your suggestion. This is a good opportunity to increase the presence of Cyclic Boosting, but unfortunately due to schedule constraints, it may be difficult for me to make any presentations.

@FelixWick
Copy link

I can do a Cyclic Boosting presentation at your meetup.

@fkiraly fkiraly changed the title [MNT][BUG] update Johnson QPDistributions wit bugfixes and vectorization (cyclic-boosting ver.1.4.0) [MNT][BUG] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) May 13, 2024
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extellent, thanks.

Almost ready! I will wrap this up together with the deprecation instructions so it can go in the next release.

@fkiraly fkiraly changed the title [MNT][BUG] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) [ENH] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) May 14, 2024
@fkiraly fkiraly merged commit 3dae189 into sktime:main May 14, 2024
36 checks passed
fkiraly added a commit that referenced this pull request May 16, 2024
This PR reworks the family of QPD family of distributions for efficiency
and to allow removal of the newly introduced dependency `findiff` in
#232.

The dependency `findiff` was introduced for approximation of `pdf`, but
in fact it is unnecessary as the `pdf` can be analytically derived by
applying the chain rule. True, it has to be applied three or four times,
but it's still the chain rule... efficiency and accuracy gains are
significant, and it helps us avoid computing numerical derivatives for
all entries in a large matrix, together with the now unnecessary
`findiff` dependency.

Makes the following changes:

* refactoring of the three QPD distributions tp use `skpro` machinery:
* use of the `skpro` native parameter broadcasting system instead of
ad-hoc broadcasting
* use of the `skpro` native approximation for `mean`, `var`, instead of
three copies of similar (and partially duplicative) approximation inside
the distributions
* refactoring between the three QPD distributions with the end of
simplification
* refactoring QPD parameter computation into a single, fully vectorized
function, `_prep_qpd_vars`
* clean room reimplementation of `cdf`, `ppf` of the three distributions
based on the `cyclic_boosting` reference
* new implementation of `pdf`, as derivative of `cdf`

As side effects of the rework:
* all parameters now broadcast in numpy-like fashion, including `alpha`,
`lower`, `upper`, which previously was not possible
* the distributions can be 2D with more than 1 column, which previously
was not possible
* `version` (the base distribution) can now be an arbitrary continuous
`scipy` distribution
* `pdf` is numerically exact
* the distributions do not have soft dependencies anymore

Regarding the relation to `cyclic_boosting`:

* this is clean room reimplementation and credit is given, so I hope
this is fine license-wise - @FelixWick?
* this is the result of trying to remove the `findiff` dependency for
computing the `pdf` from the `cdf` that was introduced in #232, as well
as cleanup before release. I ended up simplifying a lot, ending up here.
In this sense, the work of @setoguchi-naoki was crucial in arriving at
this point.
* I would have no issue at all with you moving the improved code into
`cyclic_boosting`. We can even restore the dependency and maintain the
distribution logic in `cyclic_boosting` if that were your preference,
e.g., for ownership reasons.
fkiraly pushed a commit that referenced this pull request May 28, 2024
#232

- Add deprication warning
- Docstring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug module:probability&simulation probability distributions and simulators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants