Hello, I'm back with another exciting project. I'm hoping that this readme will speak for my work and my enthusiasm for these types of projects.
-Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods. It helps determine how best to manipulate data sources to get the answers you need, making it easier for data scientists to discover patterns, spot anomalies, test a hypothesis, or check assumptions.
EDA is primarily used to see what data can reveal beyond the formal modeling or hypothesis testing task and provides a provides a better understanding of data set variables and the relationships between them. It can also help determine if the statistical techniques you are considering for data analysis are appropriate. Originally developed by American mathematician John Tukey in the 1970s, EDA techniques continue to be a widely used method in the data discovery process today.(Source :- IBM)
- Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions! This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data. Explore and analyse the data to discover important factors that govern the bookings.
The dataset used for this project is the Hotel booking demand Dataset, which can be downloaded from Kaggle.
Dataset link : - https://www.kaggle.com/uciml/sms-spam-collection-dataset
or you can download zip from (path : hotel_bookings/hotel_bookings.csv)
- Python: The primary programming language used for this project.
- Numpy: A library used for general-purpose array-processing.
- Pandas: A library used for data manipulation and analysis.
- matplotlib: It allows you to create various types of visualizations.
- Seaborn: It’s particularly useful for exploring relationships between variables, creating heatmaps, and visualizing distributions.
Q1. What are the Total Number of bookings and Cancellations?
- In terms of numbers
- Hotel wise bookings and cancellations
Q2. Which agent gets the most bookings?
- Hotel Wise
- Overall
Q3. Which channel distribution is more often use by customers?
- Hotel Wise
- Overall
Q4. Which Meal is the most favourite \ Opted meal by a customers?
- Hotel Wise
- Overall
Q5. Which country makes the most reservations?
- Hotel Wise
- Overall
Q6. In which year more reservations are made?
- Hotel Wise
- Overall
Q7. In which Month more reservations was made?
- Hotel Wise
- Overall
Q8. How many reservation were made by repeated guests?
- Hotel Wise
- Overall
Q9. what is the preferred stay length by guests in each type of hotel ?
- Overall
Q10. How does the adr/price per night vary over the year?
- Overall
Q11. How the the prices are distributed across the room type?
- Overall
Q12. Does a longer waiting period result in booking cancellations?
- Overall
There are several ways to improve and extend this project, including:
- Collecting more data: Collecting more details to improve the analysis.
- Experimenting with different visulization: Trying out different visualization techniques and algorithms for the best analysis.
- Machine learning Model Building: In future, I will try to use this analysis to built a Machine learning model.