Feature request: `predict.Rborist with nTree argument #39

hadjipantelis · 2018-03-10T00:23:34Z

I have a feature request: Would it be possible to have predict.Rborist to accept an nTree argument similar to the standard Rborist? Given that a forest has N trees, we should be able to provide predictions using N-M trees.

This functionality can be useful to see if/when additional tress lead to over-fitting.

The text was updated successfully, but these errors were encountered:

suiji · 2018-03-10T23:19:17Z

This could easily be slipped into the upcoming version, but why not just retrain with n-m trees?

Prediction could be parametrized with a logical vector, for example, with 'n-m' entries of 'true' and the remaining 'm' entries 'false'. It seems, though, that these entries would need to be chosen at random, then applied iteratively, in order to get a good sense of over-fitting.

hadjipantelis · 2018-03-10T23:37:26Z

Thank you for the speedy response.

Wouldn't retraining a new RF require additional time? (and space)

The use-case I am thinking of is that as with a GBM one can select the number of iterations, one could do something similar with an RF.
I appreciate that the tree order is irrelevant in the case of RF so what you describe with the use of logical vector is probably an ideal scenario but for a quick check just having the first N-M entries set to TRUE and the final M to FALSE is probably fine. We are bootstrapping the original data anyway. Just using the first N-M trees will also be faster because in a large forest we would not have to traverse all the trees and set them to zero.

suiji · 2018-03-12T15:45:29Z

Wouldn't retraining a new RF require additional time? (and space)

Assuming a "moderate" number of predictors, it takes (very roughly) about ten times as long to train as to predict. So, yes, resampling from the same forest will be faster than retraining each time. The results will not be identical to to those obtained through retraining, but they may be suitable for your purposes. Unless forest are retained following prediction, though, there should not be a memory penalty.

just having the first N-M entries set to TRUE and the final M to FALSE is probably fine.

Yes, but a new feature like this should be sufficiently general to support a variety of use cases.

Just using the first N-M trees will also be faster because in a large forest we would not have to traverse all the trees and set them to zero.

I may be missing your point, but initializing an index vector seems like a two- or three-line operation at worst.

hadjipantelis · 2018-03-13T10:28:03Z

Cool, we are in agreement on that.
Sure thing! I am mostly thinking what would be the most straightforward interface to the user.
Agreed but I was mostly thinking of the overheard of accessing the trees. I assume they are stored sequentially in memory so we will have "unit-stride" if we just used the first N-M trees rather than random access. Granted, it is a minor point!

suiji · 2018-03-14T15:47:36Z

assume they are stored sequentially in memory so we will have "unit-stride" if we just used the first N-M trees rather than random access

Trees are stored sequentially but their sizes are not uniform so, in particular, stride is not fixed. In any case, bagging already introduces a precedent for ignoring a given tree at a given row. The feature you propose generalizes this a bit, when prediction is not bagged, by ignoring a given tree at all rows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: `predict.Rborist with nTree argument #39

Feature request: `predict.Rborist with nTree argument #39

hadjipantelis commented Mar 10, 2018

suiji commented Mar 10, 2018

hadjipantelis commented Mar 10, 2018

suiji commented Mar 12, 2018

hadjipantelis commented Mar 13, 2018

suiji commented Mar 14, 2018

Feature request: `predict.Rborist with nTree argument #39

Feature request: `predict.Rborist with nTree argument #39

Comments

hadjipantelis commented Mar 10, 2018

suiji commented Mar 10, 2018

hadjipantelis commented Mar 10, 2018

suiji commented Mar 12, 2018

hadjipantelis commented Mar 13, 2018

suiji commented Mar 14, 2018