This repository is dedicated to collecting and scraping KBO (Korea Baseball Organization) data. It includes scripts and processes for gathering player statistics, team data, game results, and other related information.
- Python 3.12+
-
Clone the repository:
git clone https://github.com/leewr9/kbo-data-collector.git cd kbo-data-collector
-
Install dependencies:
pip install -r requirements.txt
The tool can be run via the command line and offers four main commands: game
, player
, schedule
, and team
.
python run.py <command> [options]
This command scrapes data related to specific KBO games. It will internally fetch schedule data as well.
python run.py game --date <target_date>
-p, --path
: Path to the schedule file to be parsed.-d, --date
: Specify a date (inYYYYMMDD
format) to fetch data for that day.-f, --full
: Scrape all available data from April 5, 2001, to today.
Note: Since the
game
command internally fetches the schedule data, the options-d
and-f
are the same as those for theschedule
command and will also apply when scraping game data.
This command allows you to scrape data for different types of players, including batters, pitchers, fielders, and base runners.
python run.py player --player <player_type> --season <target_season>
-p, --player
: Specify the type of player data to scrape. Valid options are:hitter
for batting statisticspitcher
for pitching statisticsfielder
for fielding statisticsrunner
for base running statistics
-a, --all
: Scrape data for all players.-s, --season
: Specify the season year (e.g.,2024
) to scrape data for that year.
This command scrapes the schedule data for KBO games. You can fetch data for a specific date or scrape all data from the start of the KBO season in 2001 to today.
python run.py schedule --date <target_date>
-d, --date
: Specify a date (inYYYYMMDD
format) to fetch data for that day.-f, --full
: Scrape all available data from April 5, 2001, to today.
This command scrapes data related to KBO teams.
python run.py team
For more detailed information on any command, you can use the --help
flag:
python run.py <command> --help
Each command is mapped to a corresponding function in the code:
scrape_game_data_command
: Handles scraping of game data.scrape_player_data_command
: Handles scraping of player data.scrape_schedule_data_command
: Handles scraping of schedule data.scrape_team_data_command
: Handles scraping of team data.
These functions take care of the web scraping and data processing based on the command-line arguments passed.
-
Scrape Game Data:
python run.py game --date 20240205
-
Scrape Player Data for Batters in 2024 Season:
python run.py player --player hitter --season 2024
-
Scrape KBO Schedule Data for a Specific Date:
python run.py schedule --date 20240205
This project is licensed under the MIT License. See the LICENSE file for details.