NBA Play-by-Play Data mining analysis

In the scope of the Data mining praktikum at Technische Universität München (TUM), we analysed NBA play-by-play data, as well as player information data which we manually acquired by scraping from the web.

We performed an extensive analysis of both datasets and extracted meaningful insights and features from the data which we later used for game outcome prediction. With our extracted features we managed to achieve a good accuracy score of around 65% on the test set with the same F1 score. The accuracy was higher when we only focused on the recent data meaning that we predicted game outcomes mid season, there we managed to achieve a higher accuracy score of around 75%.

The team was made by us 4 students:

Ege Arikan (egearikan)
Gregor Caf (gregorcaf)
Jonas Linder (Jonas-Lindner)
Žan Stanonik (Zan-23)

Getting started

Environment setup

To use the source code of the project we first have to install all the necessary dependencies. We achieve this with anaconda and the following commands, but you can also use pip:

Create a new dedicated conda environment:
conda create --name nba_dm python=3.9
Activate the environment:
conda activate nba_dm
Install all the required packages:
conda install --file requirements.txt

Now we have everything installed we only need to acquire the data files needed for analysis.

Data download

In the project we worked with two main data sets. The first one is the play-by-play data of which we acquired from this website, it contains all the play by play data from 2008 to 2019. We have also written an script which automatically downloads the data from the website.

Play-by-play data download

Play by play data is downloaded by the script by default so you only need to call this script from the project root directory:
python src/data_dowloading/data_downloader.py

The script will output a lot of information during the scraping so that you can check the progress. At the end you should have 19 csv files in the data/raw directory.

Player information data download

For this dataset we implemented data question from scratch, which means that the script extracts data directly from the live NBA website.
If you want to get the default data set (from 2000 to 2019) just run the script with this added flag: src/data_dowloading/data_downloader.py

If you want to filter the data to a specific range of years you can use these two parameters:
src/data_dowloading/data_downloader.py --player_data --p_data_season_start=2002 --p_data_season_end=2016

The given parameters mean that only the data from these seasons will be scraped. If you wish to scrape data before 2000 or after 2019 you must change the source code of the allowed parameters, since we didn't have time to properly test them on those seasons.
The downloaded data will be present in the data/raw/player_data directory.

Data sources

Here we explicitly list the pages from which the data is scraped/downloaded:

Presentation of project findings

You can view a more detailed analysis of our project in this pdf presentation.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
data		data
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
final_presentation.pdf		final_presentation.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NBA Play-by-Play Data mining analysis

Table of Contents

Getting started

Environment setup

Data download

Play-by-play data download

Player information data download

Data sources

Presentation of project findings

About

Releases

Packages

Contributors 4

Languages

Zan-23/nba-data-mining

Folders and files

Latest commit

History

Repository files navigation

NBA Play-by-Play Data mining analysis

Table of Contents

Getting started

Environment setup

Data download

Play-by-play data download

Player information data download

Data sources

Presentation of project findings

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages