-
-
Notifications
You must be signed in to change notification settings - Fork 215
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #537 from pawaspy/crop
Crop Production Data Analysis
- Loading branch information
Showing
8 changed files
with
15,132 additions
and
0 deletions.
There are no files selected for viewing
12,419 changes: 12,419 additions & 0 deletions
12,419
Crop Production Data Analysis/Dataset/Crop production.csv
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file added
BIN
+86.4 KB
Crop Production Data Analysis/Images/Rice Area vs Rice production 3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+25.8 KB
Crop Production Data Analysis/Images/Rice Area vs Rice production.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+27.8 KB
Crop Production Data Analysis/Images/statename vs Rice production.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2,609 changes: 2,609 additions & 0 deletions
2,609
Crop Production Data Analysis/Model/Crop Production Analysis.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
|
||
|
||
|
||
**CROP PRODUCTION DATA ANALYSIS** | ||
|
||
|
||
|
||
**GOAL** | ||
|
||
|
||
To predict a the rice yield (Kg per ha) from the dataset or to predict any crop yield. | ||
|
||
|
||
**DATASET** | ||
|
||
|
||
|
||
https://www.kaggle.com/datasets/zsinghrahulk/crop-production-data | ||
|
||
|
||
**DESCRIPTION** | ||
|
||
|
||
|
||
The main aim of the project is to make a model that helps to predict the rice yield of the state, dist by given rice field area, production (per 1000 ha). | ||
|
||
## Visualization and EDA of different attributes: | ||
|
||
<img alt="graph" src="./Images/statename vs Rice production.png"> | ||
|
||
<img alt="graph" src="./Images/year vs Rice production.png"> | ||
|
||
<img alt="graph" src="./Images/Rice Area vs Rice production.png"> | ||
|
||
<img alt="graph" src="./Images/Rice Area vs Rice production 3.png"> | ||
|
||
|
||
**MODELS USED** | ||
|
||
| Model | MAE_test | MSE_test | R2_test | RMSE_test | | ||
|---------------------------|----------|-----------|-----------|------------| | ||
| Random Forest Regression | 53.82 | 15314.65 | 0.98 | 123.75 | | ||
| XG Boost Regression | 68.61 | 11153.18 | 0.98 | 105.60 | | ||
| Ridge Regression | 539.03 | 515922.37 | 0.46 | 718.27 | | ||
| Decision Tree Regression | 81.75 | 29353.39 | 0.96 | 171.32 | | ||
| Lasso | 539.03 | 11153.18 | 0.46 | 718.27 | | ||
| SVR | 729.82 | 817852.71 | 0.15 | 904.35 | | ||
| LightGBM | 70.22 | 12811.94 | 0.98 | 113.18 | | ||
|
||
|
||
|
||
**WORK DONE** | ||
|
||
* Analyzed the data and found insights such as correlation, missing values etc. | ||
* Selected the columns that have high correlation than other columns to be used as features. | ||
* Next trained model with algorithms with default parameters: | ||
* Linear SVR | ||
* Lasso | ||
* Ridge | ||
* Decision Tree | ||
* Random Forest | ||
* XGBoost | ||
* LightGBM | ||
* SDRegressor | ||
* In this Linear SVR and performed the best with 90% accuracy. | ||
|
||
|
||
**MODELS USED** | ||
|
||
1. Logistic Regression : Logistic regression is easier to implement, interpret, and very efficient to train. It is **very fast at classifying unknown records**. | ||
2. Linear SVM : SVM performs well on classification problems when size of dataset is not too large. | ||
3. Random Forest : It **provides higher accuracy through cross validation**. Random forest classifier will handle the missing values and maintain the accuracy of a large proportion of data. If there are more trees, it won't allow over-fitting trees in the model. | ||
4. XGBoost : XGBoost is **a library for developing fast and high performance gradient boosting tree models**. XGBoost achieves the best performance on a range of difficult machine learning tasks. | ||
5. LightGBM : Light GBM is prefixed as Light because of its high speed. Light GBM can handle the large size of data and takes lower memory to run. it is so popular is because **it focuses on accuracy of results**. | ||
|
||
**LIBRARIES NEEDED** | ||
|
||
* Numpy | ||
* Pandas | ||
* Matplotlib | ||
* scikit-learn | ||
* xgboost | ||
* seaborn | ||
|
||
|
||
|
||
**CONCLUSION** | ||
|
||
|
||
|
||
We investigated the data, checking for data unbalancing, visualizing the features, and understanding the relationship between different features. We then investigated two predictive models. The data was split into two parts, a train set, a test set. For the first five base models, we only used the train and test set. | ||
|
||
We started with SVR, Decision Tree, Lasso, Ridge, Random Forrest Regressor and XGBoost Regressor for which we obtained an highest accuracy of 90%, when predicting the target for the test set. | ||
|
||
|
||
|
||
**CONTRIBUTION BY** | ||
|
||
*Pawas Pandey* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
matplotlib==3.4.2 | ||
seaborn==0.9.0 | ||
numpy==1.21.1 | ||
pandas==1.3.0 | ||
scikit_learn==1.0.2 |