- Overview
- Project Status
- Visualizations
- Project Details
- Data Archive
- Usage Guide
- Getting Started
- Scraper Arguments
- Missing Date Checker
Showcase visualizations about the hotel's Average Room Price in Osaka.
Average Nightly Room Price for one adult, one room.
Price in USD.
Showcase visualizations about the hotel's Average Room Price for all prefectures in Japan.
Average Nightly Room Price for one adult, one room.
Price in USD.
Built on top of Find the Hotel's Average Room Price in Osaka project.
Click here for visualizations of this project.
Click here for visualizations of this project.
-
Collect Osaka hotel property data from Booking.com
-
Data collecting period for Year 2025: 4 Sep 2024—Present
-
Consists of room price from 1 Jan 2025—31 Dec 2025
-
Data was collected daily using GitHub action.
-
Consists of Basic GraphQL and Whole-Month GraphQL scraper.
-
These scrapers can also be used to scrape data from other cities in Japan.
-
Collect Japan hotel property data for all Prefectures from Booking.com
-
Data collecting dates for Year 2025: 17 Jan 2025.
-
Consists of room price from 17 Jan 2025—31 Dec 2025.
-
Use Japan GraphQL scraper to scrape data.
Click here to access the collected hotel data archive.
- To scrape only hotel properties, use
--scrape_only_hotel
argument. - Ensure that Docker Desktop and Postgres container are running.
- Data is appended to the database for both projects.
- Clone this repo: https://github.com/sakan811/Find-Osaka-Average-Hotel-Price.git
- Install Git LFS
- Create a virtual environment and activate it.
- Install all dependencies listed in requirements.txt
- Run
playwright install
- Download Docker Desktop
- Ensure that Docker Desktop is running.
- Run:
export POSTGRES_DATA_PATH='<your_container_volume_path>'
to set the container volume to the directory path of your choice. - Run:
docker compose up -d
- Run:
python get_auth_headers.py
- It will write the headers to an
.env
file.
- It will write the headers to an
-
Example usage, with only required arguments for Whole-Month GraphQL Scraper:
python main.py --whole_mth --year=2024 --month=12 --city=Osaka
-
Scrape data start from the given day of the month to the end of the same month.
- Default start day is 1.
- Start day can be set with
--start_day
argument.
-
Example usage, with only required arguments for Basic GraphQL Scraper:
python main.py --city=Osaka --check_in=2024-12-25 --check_out=2024-12-26 --scraper
-
Example usage, with only required arguments for Japan GraphQL Scraper:
python main.py --japan_hotel
-
Prefecture to scrape can be specified with
--prefecture
argument, for example:-
python main.py --japan_hotel --prefecture Tokyo
-
If
--prefecture
argument is not specified, all prefectures will be scraped. -
Multiple prefectures can be specified.
-
python main.py --japan_hotel --prefecture Tokyo Osaka
-
-
You can use the prefecture name on Booking.com as a reference.
-
If the not match error happened (SystemExit exception), please try running the scraper again.
Click here for Scraper's arguments details.
To ensure that all dates of the month were scraped, a function in check_missing_dates.py will check in the database to find the missing dates.
Made only for the Find the Hotel's Average Room Price in Osaka project which saves scraped data in HotelPrice table.
-
To check in the database, use the following command line as an example, only include required argument:
python check_missing_dates.py --city=Osaka
-
If there are missing dates, a Basic Scraper will automatically start to scrape those dates.
- Missing Date Checker shares arguments with Basic Scraper.
- Arguments parsed to Missing Date Checker should be the same as used with Basic Scraper.
-
Only check the missing dates of the data that was scraped today in UTC time.
-
Only check the months that were scraped and loaded to the database.
-
Year of dates can be specified with
--year
- Default is the current year.
If the not match error happened (SystemExit exception), please try running the Missing Date Checker again.