Add GBDTs feature importance #292

xnuohz · 2023-12-12T09:55:20Z

Add APIs to get feature importance
Add test case
Update example

codecov · 2023-12-12T09:58:48Z

Codecov Report

Attention: Patch coverage is 95.45455% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.41%. Comparing base (a90a154) to head (3b4ceb5).
Report is 87 commits behind head on master.

Files with missing lines	Patch %	Lines
torch_frame/gbdt/gbdt.py	85.71%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #292   +/-   ##
=======================================
  Coverage   93.41%   93.41%           
=======================================
  Files         116      116           
  Lines        5949     5970   +21     
=======================================
+ Hits         5557     5577   +20     
- Misses        392      393    +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yiweny · 2023-12-15T07:45:06Z

torch_frame/gbdt/tuned_xgboost.py

@@ -232,3 +232,7 @@ def _load(self, path: str) -> None:
        import xgboost

        self.model = xgboost.Booster(model_file=path)
+
+    def _feature_importance(self) -> list:
+        scores = self.model.get_score(importance_type='weight')


Maybe weight can be passed as an argument

yiweny · 2023-12-15T07:45:16Z

torch_frame/gbdt/tuned_lightgbm.py

@@ -226,3 +226,7 @@ def _load(self, path: str) -> None:
        import lightgbm

        self.model = lightgbm.Booster(model_file=path)
+
+    def _feature_importance(self) -> list:
+        scores = self.model.feature_importance(importance_type='gain')


yiweny · 2023-12-15T07:45:31Z

torch_frame/gbdt/gbdt.py

@@ -135,6 +139,19 @@ def load(self, path: str) -> None:
        self._load(path)
        self._is_fitted = True

+    def feature_importance(self) -> list:


Suggested change

def feature_importance(self) -> list:

def feature_importance(self, *args, **kwargs) -> list:

yiweny · 2023-12-15T07:46:18Z

test/gbdt/test_gbdt.py

+    num_features = 0
+    for x in stypes:
+        if x == stype.numerical:
+            num_features += 3 * 1


I don't quite get the code here.

Here I want to get the total number of FakeDataset features. 3 means the number and 1 means the dimension.

Can you do so by some more generic ways, e.g. getting the values and dimensions from col_names_dict or tensor_frame, rather than using magic numbers?

for more information, see https://pre-commit.ci

yiweny

Left some comments. @weihua916 or @zechengz or @akihironitta should also take a look.

yiweny · 2023-12-18T22:26:14Z

test/gbdt/test_gbdt.py

+    num_features = 0
+    for x in stypes:
+        if x == stype.numerical:
+            num_features += 3 * 1


Can you do so by some more generic ways, e.g. getting the values and dimensions from col_names_dict or tensor_frame, rather than using magic numbers?

yiweny · 2023-12-18T22:26:51Z

torch_frame/gbdt/tuned_lightgbm.py

+        iteration (int, optional): Limit number of iterations in the feature
+            importance calculation. If None, if the best iteration exists,
+            it is used; otherwise, all trees are used. If <= 0, all trees
+            are used (no limits).


Add doc-string on iteration

yiweny · 2023-12-18T22:27:50Z

examples/tuned_gbdt.py

+        'feature': dataset.feat_cols,
+        'importance': gbdt.feature_importance()
+    }).sort_values(by='importance', ascending=False)
+    print(scores)


Add some more text around the scores

Maybe we can add an parser argument to enable user specify whether they want to have feature importance.

zechengz · 2023-12-18T22:36:23Z

examples/tuned_gbdt.py

+        'feature': dataset.feat_cols,
+        'importance': gbdt.feature_importance()
+    }).sort_values(by='importance', ascending=False)
+    print(scores)


Maybe we can add an parser argument to enable user specify whether they want to have feature importance.

zechengz · 2023-12-18T22:40:43Z

torch_frame/gbdt/tuned_lightgbm.py

+        ], f'Expect split or gain, got {importance_type}.'
+        scores = self.model.feature_importance(importance_type=importance_type,
+                                               iteration=iteration)
+        return scores.tolist()


Will this list to be just a list of scores? IMO it's better to return a dictionary where keys are column names and values are corresponding scores. WDYT?

The return types of GBDT's feature importance API are different. For convenience, I converted them to lists.

lightgbm -> ndarray xgboost -> dict[str, float] catboost -> ndarray

update

a12e009

akihironitta assigned xnuohz Dec 12, 2023

akihironitta added 1 - Priority P1 feature labels Dec 12, 2023

yiweny reviewed Dec 15, 2023

View reviewed changes

xnuohz and others added 5 commits December 15, 2023 21:47

update

05028aa

[pre-commit.ci] auto fixes from pre-commit.com hooks

1c6ef5a

for more information, see https://pre-commit.ci

Merge branch 'master' into gbdts/feature-importance

47bf61c

update

4f3ce7c

update

6d80033

yiweny reviewed Dec 18, 2023

View reviewed changes

zechengz reviewed Dec 18, 2023

View reviewed changes

xnuohz and others added 3 commits December 20, 2023 02:12

update

871c91f

Merge branch 'master' into gbdts/feature-importance

3aa1b00

Merge branch 'master' into gbdts/feature-importance

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

Loading
Loading status checks…

3b4ceb5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GBDTs feature importance #292

Add GBDTs feature importance #292

xnuohz commented Dec 12, 2023

codecov bot commented Dec 12, 2023 •

edited

Loading

yiweny Dec 15, 2023

yiweny Dec 15, 2023

yiweny Dec 15, 2023

yiweny Dec 15, 2023

xnuohz Dec 15, 2023

yiweny Dec 18, 2023

yiweny left a comment

yiweny Dec 18, 2023

yiweny Dec 18, 2023

yiweny Dec 18, 2023

zechengz Dec 18, 2023

zechengz Dec 18, 2023

zechengz Dec 18, 2023

xnuohz Dec 19, 2023

	def feature_importance(self) -> list:
	def feature_importance(self, args, *kwargs) -> list:

Add GBDTs feature importance #292

Are you sure you want to change the base?

Add GBDTs feature importance #292

Conversation

xnuohz commented Dec 12, 2023

codecov bot commented Dec 12, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiweny left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 12, 2023 •

edited

Loading