- Iteratively training 8 model based on the different transformations.
- While creating the cleaned and appropriate data (i.e. Vector) for training, data is partitioned them by row.
- As the Random Forest Model partitions the feature while training, at that time the data is partitioned by features i.e. Column.
- While testing, the data on the models is run iteratively and predicting the final value through Polling.
- Algorithm: Random Forest Classifier
- Parameters:
- Number of trees = 20
- Depth = 5
- Training data size = 30 * 8 ~ 240 GB
- Accuracy = 99.74%
- Time to Train models = 100 minutes on 18 m4.large
- Time to run the Test data = 7 minutes on 15 m4.large