Skip to content

Commit

Permalink
Merge pull request #654 from adi271001/cement-strength
Browse files Browse the repository at this point in the history
Cement Strength Prediction
  • Loading branch information
abhisheks008 authored Jun 19, 2024
2 parents 81a1bda + 829ab8d commit b2cb117
Show file tree
Hide file tree
Showing 16 changed files with 1,373 additions and 0 deletions.
1,031 changes: 1,031 additions & 0 deletions Cement Strength Prediction/Dataset/Cement Strength Data.csv

Large diffs are not rendered by default.

192 changes: 192 additions & 0 deletions Cement Strength Prediction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
# Cement Strength Prediction

## Table of Contents

```
Goal
Dataset
Description
What I Had Done
Installation
Libraries
EDA Results
Models and Results
Conclusion
Contributing
Signature
```

## Goal

To predict compressive strength of concrete using various machine learning models


## Dataset

Link: https://www.kaggle.com/datasets/himalayaashish/cement-strength-dataset/data


## Description

* This Folder contains the code and resources for predicting the compressive strength of concrete using various machine learning models.
* The prediction is based on the ingredients of concrete, such as cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, and fine aggregate.


## What I had done!
## Installation
* clone the repository using the following command

```
git clone https://github.com/yourusername/cement-strength-prediction.git
cd cement-strength-prediction
```

* To run the notebook and reproduce the results, you need to have Python installed along with the
necessary libraries. You can install the required libraries using the following command:

```
pip install -r requirements.txt
```

* run the jupyter notebook

```
jupyter notebook cement-strength-prediction.ipynb
```

## Libraries Needed

* pandas==1.3.3
* numpy==1.21.2
* matplotlib==3.4.3
* seaborn==0.11.2
* scipy==1.7.1
* statsmodels==0.12.2
* sklearn==0.24.2
* xgboost==1.4.2
* lightgbm==3.2.1
* catboost==0.26.1
* tqdm==4.62.2
* optuna==2.9.1

## Exploratory Data Analysis Results

* ![relationship graphs](https://github.com/adi271001/ML-Crate/blob/cement-strength/Cement%20Strength%20Prediction/images/__results___8_0.png?raw=true)
* ![cluster graphs](https://github.com/adi271001/ML-Crate/blob/cement-strength/Cement%20Strength%20Prediction/images/__results___9_0.png?raw=true)
* ![Pearson correlation Matrix](https://github.com/adi271001/ML-Crate/blob/cement-strength/Cement%20Strength%20Prediction/images/__results___10_0.png?raw=true)
* ![spearman correlation matrix](https://github.com/adi271001/ML-Crate/blob/cement-strength/Cement%20Strength%20Prediction/images/__results___11_0.png?raw=true)
* ![predictive power score](https://github.com/adi271001/ML-Crate/blob/cement-strength/Cement%20Strength%20Prediction/images/__results___12_0.png?raw=true)
* ![line of best fit graphs](https://github.com/adi271001/ML-Crate/blob/cement-strength/Cement%20Strength%20Prediction/images/__results___13_0.png?raw=true)


## Models and Results

The project explores the following machine learning models to predict the compressive strength of concrete:

### 1. Decision Tree Regressor

#### Why this model : Decision trees are easy to interpret and can handle both numerical and categorical data. They can capture non-linear relationships in the data.

```
Results:
MAE Train: 0.0895
MAE Test: 4.2191
R² Train: 0.9959
R² Test: 0.8525
```
### 2. Random Forest Regressor

#### Why this model : Random forests improve on decision trees by reducing overfitting through ensemble learning. They are robust and handle large datasets well.

```
Results:
MAE Train: 1.2639
MAE Test: 3.6031
R² Train: 0.9854
R² Test: 0.9061
```
### 3. Extra Trees Regressor

#### Why this model : Extra Trees Regressor, similar to Random Forest, but with more randomness in the splitting of nodes. It tends to reduce variance more than a random forest.

```
Results:
MAE Train: 1.2762
MAE Test: 3.5927
R² Train: 0.9852
R² Test: 0.9105
```
### 4. Gradient Boosting Regressor

#### Why this model : Gradient Boosting builds trees sequentially, with each tree trying to correct the errors of the previous one. It is powerful for many regression and classification tasks.

```
Results:
MAE Train: 2.7935
MAE Test: 3.7280
R² Train: 0.9491
R² Test: 0.9071
```
### 5. HistGradient Boosting Regressor

#### Why this model : HistGradient Boosting is a fast, scalable implementation of Gradient Boosting that works well with large datasets.

```
Results:
MAE Train: 1.3844
MAE Test: 3.0658
R² Train: 0.9822
R² Test: 0.9227
```
### 6. XGBoost Regressor

#### Why this model : XGBoost is an optimized implementation of gradient boosting that is efficient and performs well on structured data.

```
Results:
MAE Train: 0.3877
MAE Test: 3.0132
R² Train: 0.9951
R² Test: 0.9266
```
### 7. LGBM Regressor

#### Why this model : LightGBM is a gradient boosting framework that uses tree-based learning algorithms, designed to be distributed and efficient.

```
Results:
MAE Train: 1.3884
MAE Test: 3.0798
R² Train: 0.9822
R² Test: 0.9247
```
### 8. CatBoost Regressor

#### Why this model : CatBoost is a gradient boosting algorithm that handles categorical features automatically and efficiently, often providing high accuracy with minimal parameter tuning.

```
Results:
MAE Train: 1.2125
MAE Test: 2.6963
R² Train: 0.9870
R² Test: 0.9414
```

## Conclusion
After evaluating various machine learning models, it is evident that ensemble methods such as Random Forest, Extra Trees, and gradient boosting techniques like XGBoost, LGBM, and CatBoost perform significantly better than a single decision tree. These models effectively reduce overfitting and provide more accurate predictions due to their ability to capture complex relationships within the data.

- **Best Performing Models:** CatBoost and XGBoost achieved the lowest test MAE and highest R² scores, indicating robust predictive performance.
- **Important Features:** Features such as cement, water, and blast furnace slag were consistently found to be the most influential in predicting the compressive strength of concrete.

## Contributing

Contributions are welcome! Please read the contribution guidelines first.

## Signature
Aditya D
* Github: https://wwww.github.com/adi271001
* LInkedin: https://www.linkedin.com/in/aditya-d-23453a179/
* Topmate: https://topmate.io/aditya_d/
* Twitter: https://x.com/ADITYAD29257528


Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
137 changes: 137 additions & 0 deletions Cement Strength Prediction/model/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Models Overview and Analysis

This folder contains the code and resources for the various machine learning models used to
predict the compressive strength of concrete. Each model is evaluated based on its performance
metrics and visualizations.

## Table of Contents

```
Models
Performance Metrics
Visualizations
Conclusion
```
## Models

### 1. Decision Tree Regressor

```
Results:
MAE Train: 0.0895
MAE Test: 4.2191
R² Train: 0.9959
R² Test: 0.8525
```
### 2. Random Forest Regressor

```
Results:
MAE Train: 1.2639
MAE Test: 3.6031
R² Train: 0.9854
R² Test: 0.9061
```
### 3. Extra Trees Regressor

```
Results:
MAE Train: 1.2762
MAE Test: 3.5927
R² Train: 0.9852
R² Test: 0.9105
```
### 4. Gradient Boosting Regressor

```
Results:
MAE Train: 2.7935
MAE Test: 3.7280
R² Train: 0.9491
R² Test: 0.9071
```
### 5. HistGradient Boosting Regressor

```
Results:
MAE Train: 1.3844
MAE Test: 3.0658
R² Train: 0.9822
R² Test: 0.9227
```
### 6. XGBoost Regressor

```
Results:
MAE Train: 0.3877
MAE Test: 3.0132
R² Train: 0.9951
R² Test: 0.9266
```
### 7. LGBM Regressor

```
Results:
MAE Train: 1.3884
MAE Test: 3.0798
R² Train: 0.9822
R² Test: 0.9247
```
### 8. CatBoost Regressor

```
Results:
MAE Train: 1.2125
MAE Test: 2.6963
R² Train: 0.9870
R² Test: 0.9414
```
## Performance Metrics

The models were evaluated using the following metrics:

```
Mean Absolute Error (MAE): Measures the average magnitude of errors in predictions.
R² Score: Indicates the proportion of variance in the dependent variable predictable from
the independent variables.
```
## Visualizations

The notebook includes several plots to visualize the performance of the models, such as:

```
Feature Importance: Highlights the most influential features for each model.
Prediction vs Actual: Compares predicted compressive strengths with actual values.
Residuals Plot: Shows the residual errors for each model.
```

![Prediciton vs Actual](https://github.com/adi271001/ML-Crate/blob/cement-strength/Cement%20Strength%20Prediction/images/__results___25_0.png?raw=true)
![Residuals](https://github.com/adi271001/ML-Crate/blob/cement-strength/Cement%20Strength%20Prediction/images/__results___26_0.png?raw=true)
![Tuned-Predicted vs Actual](https://github.com/adi271001/ML-Crate/blob/cement-strength/Cement%20Strength%20Prediction/images/__results___32_0.png?raw=true)
![Tuned-Residuals](https://github.com/adi271001/ML-Crate/blob/cement-strength/Cement%20Strength%20Prediction/images/__results___33_0.png?raw=true)

### Feature Importance

Feature importance visualizations highlight the significant features affecting the compressive
strength predictions for each model.

### Prediction vs Actual

These plots compare the predicted values against the actual values, providing insight into the
models' accuracy.

### Residuals Plot

Residuals plots show the errors of predictions, helping to identify any patterns that the models
may have missed.

## Conclusion

The models demonstrate varying levels of eff ectiveness in predicting concrete strength.
Ensemble methods like Random Forest, Extra Trees, and Gradient Boosting generally provide
better performance compared to individual models like Decision Tree Regressor. CatBoost and
XGBoost show competitive performance with superior accuracy. The results highlight the
importance of feature selection and model tuning in achieving optimal predictive accuracy.



Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions Cement Strength Prediction/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
pandas==1.3.3
numpy==1.21.2
matplotlib==3.4.3
seaborn==0.11.2
scipy==1.7.1
statsmodels==0.12.2
sklearn==0.24.2
xgboost==1.4.2
lightgbm==3.2.1
catboost==0.26.1
tqdm==4.62.2
optuna==2.9.1

0 comments on commit b2cb117

Please sign in to comment.