Skip to content

LKEthridge/Integrated_Project_2

Repository files navigation

Integrated_Project_2

This was an Integrated skill project for TripleTen. 👩🏽‍💻

This project developed a machine learning solution for predicting gold recovery at the rougher and final stages of ore processing using datasets with over 80 parameters. A Multi-Output Random Forest Regression model provided the most accurate predictions during training, with Linear Regression as a viable, less computationally intensive alternative. Despite underperforming compared to constant benchmarks on the test set, the models demonstrate the potential for data-driven optimization of industrial processes.

Skills Highlighted

🐍 Python 👩🏽‍💻 Data Science 🤖 Machine Learning 🧪 Scikit Learn ❌ Cross Validation 🐼 pandas 📊 Data Analytics 👀 Supervised Learning ⚙️ Feature Engineering 💯 Model Evaluation 🕵🏽‍♀️ Anomaly Detection 🧼 Data Cleaning and Preprocessing

Installation & Usage

  • This project uses pandas, numpy, RandomForestRegressor, MultiOutputRegressor, LinearRegression, mean_squared_error, mean_absolute_error, make_scorer, matplotlib.pyplot, shuffle, StandardScaler, seaborn, SimpleImputer, cross_val_score, KFold, and RandomizedSearchCV. It requires python 3.9.6. There is one additional file containing the full, unsplit test set that I was unable to upload due to upload limitations.