- Model Type: Gradient Boosting Regressor
- Features Used: Airline, Source City, Departure Time, Stops, Arrival Time, Destination City, Class, Duration, Days Left
- Framework: PySpark’s MLlib
Price Prediction Pattern: The model exhibits a linear correlation between actual and predicted flight prices, consistent up to approximately INR 70,000. However, there might be deviations beyond this price range.
- RMSE (Root Mean Square Error): 1520.00 - Indicates a relatively small average prediction error, suggesting good model precision.
- R² (Coefficient of Determination): 0.995 - Demonstrates that the model accounts for approximately 99.5% of the variance in the observed data, indicating high predictive power.
- MAE (Mean Absolute Error): 601.44 - Provides an additional perspective on the average magnitude of the errors in predictions.
- Best Parameters Identified:
- Number of Trees: 60
- Number of Features: 3
The Gradient Boosting Regressor model shows a high degree of accuracy in predicting flight prices within the considered feature set. The excellent R² score coupled with a low RMSE and MAE reflects the model's effectiveness. While the model's predictions are mostly linear up to a certain price point, a slight deviation for higher-priced flights is noted, suggesting an area for further investigation and potential refinement of the model.