Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mldb 2095 boosting and stump #785

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

guyd
Copy link
Contributor

@guyd guyd commented Dec 20, 2016

I will wait until this is carefully reviewed before merging since it impacts the logic of all classifiers.

The main problem was in the Dense_Feature_Set that did not force the features to be sorted. This resulted in confusion about features while predicting. This was happening only when a classifier was returning false to predict_is_optimized (the default). This was in particular the case of Stump.

I cleaned also some of the logic around optimization. It seems that the original intent was to have all optimize calls to work transparently even when a classifier did not support optimization. This condition was tracked in two ways, calling predict_is_optimized() and using the Optimization_Info::initialized member. I kept only the former. It is therefore an error now to call an optimize predict with an uninitialized Optimization_Info.

I also removed the call optimization_supported. This call was not used except in the creation of the Optimization_Info and since the optimized predict methods can be called in all cases, it was not necessary to keep it.

Lastly, the all_features() method does not return all the features the classifier was trained on. An example is with a Committee of Stump where only the features that were actually selected during the training will be persisted. This might not be an issue but thought it was at the beginning. For this reason, the optimized predict will use Optimize_Info::from_features instead of Optimize_Info::to_features but this might not be required or even a good idea.

@jeremybarnes
Copy link
Contributor

all_features should only return used features, not trained features

@guyd
Copy link
Contributor Author

guyd commented Jan 18, 2017

@jeremybarnes I get it for the all_features. I will update this PR since it has staled. I will submit for review in the coming day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants