-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oob_prediction_ in RandomForestClassifier #267
Comments
Combine the y values and the oob prediction values into a data frame |
But the problem set asks us to use RandomForestClassifier? |
I mistakenly used the RFregression. But the calculation method remain the same. I just took one more step to transform the prediction numeric values as 1 or 0, while RFclassifier would directly produce them. |
My problem is I think RFclassifier doesn't have oob_prediction_ attribute. I hope it's somewhere I missed. |
Ahhhh, I see... haven’t tried it... |
I have checked the original code of sklearn package. In RandomForestClassifier, we can use oob_decision_function_ to calculate the oob prediction.
|
Thank you so much! I will give it a shot later.
…On Fri, Mar 2, 2018 at 8:37 PM Kanyao Han ***@***.***> wrote:
I have check the original code of sklearn package. In
RandomForestClassifier, we can use oob_decision_function_ to calculate the
oob prediction.
1. Transpose the matrix produced by oob_decision_function_
2. Select the second raw of the matrix
3. Set a cutoff and transform all decimal values as 1 or 0 (>= 0.5 is
1 and otherwise 0)
The list of values we finally get is the oob prediction.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#267 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AeFbigat9n1rFV03Sy7Uq5MVs0oHTP-Uks5tagHmgaJpZM4SZI-f>
.
|
Cross checking against the way sklearn calculates RandomForestClassifier().oob_score_, I believe cutoff for step 3 should be: |
Forget the last message... |
Right, I think 0.5 IS quite arbitrary on the part of sklearn. I just went along with it for MSE calculations because if that's what sklearn decides to come up with predicted y values, it would consistent to evaluate MSE according to the same thresholds. |
Looking to why the 0.5, I found: It seems that the way sklearn decides if category 0 or 1 is it looks at the relative probabilities of an observation is 0 or 1 in the the oob_decision_function_ matrix. Of second value is larger than the first, it decides in favour of 1. Since probability adds up to one, this effectively means the 0.5 threshold. |
Most classifiers decide the value according to this kind of probability. Theoritically the thread point should not be classified as 1 or 0 because it will make both sides unbalanced. But practically it’s not a problem no matter it is classified as 1 or 0. As long as the sample size is large enough and the number of tree we set in the model is also large, It’s effect on the result will decrease. |
Does any one know how to get error rates for each category of a binary variable in RandomForestClassifier?
I found out obb_prediction_ seemed to be exclusive to RandomForestRegressor.
Thanks.
The text was updated successfully, but these errors were encountered: