-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GBDTs feature importance #292
base: master
Are you sure you want to change the base?
Changes from all commits
a12e009
05028aa
1c6ef5a
47bf61c
4f3ce7c
6d80033
871c91f
3aa1b00
3b4ceb5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
from __future__ import annotations | ||
|
||
from typing import Any | ||
from typing import Any, Optional | ||
|
||
import numpy as np | ||
import pandas as pd | ||
|
@@ -226,3 +226,27 @@ def _load(self, path: str) -> None: | |
import lightgbm | ||
|
||
self.model = lightgbm.Booster(model_file=path) | ||
|
||
def _feature_importance(self, importance_type: str = 'gain', | ||
iteration: Optional[int] = None) -> list: | ||
r"""Get feature importances. | ||
|
||
Args: | ||
importance_type (str): How the importance is calculated. | ||
If "split", result contains numbers of times the feature | ||
is used in a model. If "gain", result contains total gains | ||
of splits which use the feature. | ||
iteration (int, optional): Limit number of `iterations` in the feature | ||
importance calculation. If None, if the best `iteration` exists, | ||
it is used; otherwise, all trees are used. If <= 0, all trees | ||
are used (no limits). | ||
|
||
Returns: | ||
list: Array with feature importances. | ||
""" | ||
assert importance_type in [ | ||
'split', 'gain' | ||
], f'Expect split or gain, got {importance_type}.' | ||
scores = self.model.feature_importance(importance_type=importance_type, | ||
iteration=iteration) | ||
return scores.tolist() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will this list to be just a list of scores? IMO it's better to return a dictionary where keys are column names and values are corresponding scores. WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The return types of GBDT's feature importance API are different. For convenience, I converted them to lists.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add doc-string on
iteration