The Machine Learning Classifier Comparison Tool helps benchmark and compare the performance of various machine learning classifiers on a dataset. It supports optional evaluation data, cross-validation (or none if splits = 1), and an embedded parallel-coordinates visualization of the final results.
Load the train and evaluation datasets, select the classifiers, set the cross-validation parameters, and run the experiment for a selected number of runs tracking the best, worst, average, and standard deviation of the accuracy, F1, and recall scores.
The results are displayed in a table and can be exported to a CSV file the value in parentheses is the difference from the train and the evaluation dataset for synthetic data evaluation. ACC is the accuracy, F1 is the F1 score, and REC is the recall score.
The results can be visualized in a parallel coordinates plot with unique colors for each classifier with normalization and axes toggles.
- Load Main Dataset
- Load a CSV file for training/benchmarking.
- The tool automatically identifies the class column (requires "class" in the column name).
- Optional Evaluation Dataset
- Load a second CSV for evaluation.
- If provided, cross-validation is performed on the evaluation data (training always on the main dataset).
- Flexible Cross-Validation
- Set the number of folds for CV (
Cross-Validation Split
). - If set to 1, no cross-validation is performed (the entire main dataset is used for training, and either the same dataset or the evaluation dataset is used for testing).
- Set the number of folds for CV (
- Multiple Classifiers
- Choose from a variety of popular algorithms (Decision Tree, Random Forest, SVM, KNN, Logistic Regression, AdaBoost, XGBoost, etc.).
- Hyperparameter Editing
- Each classifier has its own parameter panel (e.g., number of neighbors for KNN, max depth for Trees, etc.).
- Multiple Runs
- Specify the number of runs to repeat the experiment (with different seeds) for more robust statistics.
- Results & Visualization
- Best, worst, average, and standard deviation (std) for Accuracy, F1, and Recall are displayed in a results table.
- Parallel Coordinates: click “Visualize” to see an embedded parallel coordinates plot in a separate tab.
- Export results to CSV.
- Load Main File (required).
- Optionally load an Evaluate File if you want to test on separate data.
- Go to Classifiers tab, pick one or more algorithms, and set the cross-validation parameters (split, runs, seed).
- Go to Parameters tab to tweak each classifier’s hyperparameters.
- Click Run Selected Classifiers to benchmark.
- Check results in the Results tab.
- Export to CSV if desired.
- Click Visualize to see a parallel coordinates chart in the Plot tab.
- Clone the repository
- Run
pip install -r requirements.txt
- Run the
main.py
file withpython main.py
orpython3 main.py
depending on your python installation.
- Explore further graphical summaries (e.g., box plots, bar charts).
- Automatic hyperparameter tuning with grid or random search.
- Color palette from Roman Roads Project
This project is licensed under the MIT License.