This project developed a machine learning solution for predicting gold recovery at the rougher and final stages of ore processing using datasets with over 80 parameters. A Multi-Output Random Forest Regression model provided the most accurate predictions during training, with Linear Regression as a viable, less computationally intensive alternative. Despite underperforming compared to constant benchmarks on the test set, the models demonstrate the potential for data-driven optimization of industrial processes.
🐍 Python 👩🏽💻 Data Science 🤖 Machine Learning 🧪 Scikit Learn ❌ Cross Validation 🐼 pandas 📊 Data Analytics 👀 Supervised Learning ⚙️ Feature Engineering 💯 Model Evaluation 🕵🏽♀️ Anomaly Detection 🧼 Data Cleaning and Preprocessing
- This project uses pandas, numpy, RandomForestRegressor, MultiOutputRegressor, LinearRegression, mean_squared_error, mean_absolute_error, make_scorer, matplotlib.pyplot, shuffle, StandardScaler, seaborn, SimpleImputer, cross_val_score, KFold, and RandomizedSearchCV. It requires python 3.9.6. There is one additional file containing the full, unsplit test set that I was unable to upload due to upload limitations.