Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Issue with the ExcelFormer Example #460

Open
mtreca opened this issue Oct 7, 2024 · 1 comment
Open

Potential Issue with the ExcelFormer Example #460

mtreca opened this issue Oct 7, 2024 · 1 comment

Comments

@mtreca
Copy link

mtreca commented Oct 7, 2024

Hi all!

Stepping through the example script for ExcelFormer, I notice that this line fails with my custom dataset.

AFAIK this seems due to CatToNumTransform adding _{i} strings to the end of categorical feature names, but these names not being changed in the output TensorFrame of the CatToNumTransform. Hence, the mutual_info_sort.transformed_stats being passed to ExcelFormer on line 107 contains _{i} categorical column names while the actual TensorFrame does not.

Case in point, calling this snippet to manually rename statistics to their original name fixes the issue:

fixed_stats = cat_to_num.transformed_stats
for cat_feature in categorical_feature_names:
    stats = fixed_stats.pop(f"{cat_feature}_0")
    fixed_stats[cat_feature] = stats

That fix might not work if the classification task is other than binary though, hence the preferred fix would be for CatToNumTransform to actually rename the column names of the TensorFrames it transforms.

@yiweny
Copy link
Contributor

yiweny commented Nov 17, 2024

Thanks for reporting! I will take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants