1. Project Objective
2. Data Description
3. Data Pre-processing steps and inspiration
4. Inferences made from the data
5. Choosing the algorithm for the project
6. Motivation and reasons for choosing the algorithm
7. Conclusion
The objective is to analyse the flight booking dataset obtained from a platform which is used to book flight tickets. A thorough study of the data will aid in the discovery of valuable insights that will be of enormous value to passengers. Apply EDA, statistical methods and Machine learning algorithms in order to get meaningful information from it.
Dataset Information: Flight booking price prediction dataset contains around 3 lacs records with 11 attributes
a. Loading the Dataset: We Have Loaded the dataset using pandas.
b. Checking for Data Types: It is imperative to inspect the data types of each column to ensure consistency and appropriateness for subsequent analyses and operations.
c. Preprocessing Data: In the Preprocessing step we have Inspected the dataset. And removed the columns which unwanted. and we found that there are no missing values in the dataset. And we have done Label Encoding in order to do statical analysis. And machine learning model implementation. After that we have standardized the data which feuded to the ML models to get better performance of the model.
d. Handling Outliers: There is an outlier in the ‘Duration’, ‘days left’, ‘price’ columns. And we have removed the outliers. With IQR Range.
So here our target is to predict price. so, our EDA will also be done by targeting the price column which is dependent on other independent variables.
• Air India and Vistara Has Highest Ticket Price Compared to other airlines.
• Remaining airlines prices are more or the same.
• Most of the flights are in range up to 10000 price range
• Most of the Cities are in same range of price
• the price for ‘one’ stop is higher than others
• Most of the Departure Time are in same range of price.
• Evening and Morning Prices are high and Late night is having low prices
• The Delhi has low price compared to others.
• Remain all are more or the same range.
• The Business class is having high price than the Economy.
• as The Duration Increase the prices also increasing.
• But there is no straight forward Correlation. Some durations having low prices also.
• Majority of the prices are increasing as duration increase
• as The Day's left are increasing the prices falling down.
• It is indicating that early bookings are good to save money.
• AIRLINE: The Vistara and air India has high frequency respect to the count. SpiceJet is has Low Frequency
• SOURCE CITY: The Delhi and Mumbai has high frequency respect to the count. Chennai is having Low Frequency
• DEPARTURE TIME: Early Morning and Morning has high frequency respect to the count. Late Night is having Low Frequency.
• STOPS: One Stop has high frequency respect to the count.
• ARRIVAL TIME: Night and Evening has high frequency respect to the count. Late Night is having Low Frequency
• DESTINATION CITY: Delhi and Mumbai have high frequency respect to the count. Chennai is having Low Frequency.
• CLASS: Economy Class high Frequency.
Here we have chosen different model’s that can predict the price of the flight ticket booking Models listed below:
1. Linear Regression
2. Decision Tree Regressor
3. Random Forest Regressor
• Motive for all this model is to predict the ticket price.
• In this model our independent features would be all expect flight and price.
• And we will evaluate the model performance with the help of r2 score, MAE, MAPE, MSE, RMSE. Root Mean square error (RMSE).
1. Linear Regression
• Motive for choosing this model is to predict the target which is continues in nature. Which is regression problem.
2. Decision Tree Regressor
• Decision tree models are adept at handling classification and regression problems.
• by recursively partitioning the input space into regions, making predictions based on the majority class or average value within each region.
• This allows them to handle both categorical and numerical data, making them versatile for a wide range of predictive tasks in various domains.
3. Random Forest Regressor
• Random Forest offers high predictive accuracy by averaging predictions from multiple decision trees, making it robust to overfitting.
• It handles non-linear relationships well, provides feature importance insights, and is resilient to outliers and missing data. With its scalability, ability to handle large datasets, and no assumptions about data distribution, Random Forest is a versatile choice suitable for various machine learning tasks.
• Based on the analysis conducted and the model’s performance on the dataset, it can be concluded that the Liner Regression for predicting Ticket Prices and Decision Tree Regressor and Random Forest Regressor will be fit and suitable for predicting the flight ticket booking price.
• r2_Score: 0.9049847760699258
• mean abs error: 4625.601159976593
• mean absolute percentage error: 0.43627317283189276
• mean sq error: 48931028.45225085
• RMSE: 6995.071726026178
• r2_Score: 0.9761309874775854
• mean abs error: 1168.7175598163174
• mean absolute percentage error: 0.07422407497383977
• mean sq. error: 12292086.284203626
• RMSE: 3506.0071711569026
• r2_Score: 0.9852298788429149
• mean abs error: 1085.061065975238
• mean absolute percentage error: 0.07049862212289662
• mean sq. error: 7606330.740349579
• RMSE: 2757.9577118494003