This repository contains a project focused on predicting global video game sales using various machine learning models. By analyzing a comprehensive dataset, we explore the factors that influence video game sales and develop predictive models to estimate sales figures with high accuracy.
Video game sales data from Kaggle was analyzed using machine learning techniques to understand the factors influencing sales. The project aims to assist stakeholders in the gaming industry by providing actionable insights and accurate predictions.
- Develop powerful machine learning models to predict global sales of video games.
- Explore and compare the performance of various regression models.
- Gain insights into key factors influencing video game sales for better decision-making.
- Handled missing values and ensured consistent data types.
- Transformed categorical features like
Platform
andGenre
into numerical values using Label Encoding.
The following machine learning models were employed:
- K-Nearest Neighbors (KNN)
- Multiple Linear Regression (MLR)
- Decision Tree Regression (DTR)
- Random Forest Regression (RFR)
- Gradient Boosting Regression (GBR)
- Mean Squared Error (MSE)
- R-squared Score
The performance of all models was evaluated, and the Gradient Boosting Regression model was identified as the most effective for sales prediction.
- Gradient Boosting Regression achieved the highest R-squared score, demonstrating superior predictive ability.
- Insights from feature analysis revealed the significant impact of platform and genre on sales performance.
- Scatter plots comparing actual vs. predicted values for all models.
- Bar chart showcasing R-squared scores for all models.
- Programming Language: Python
- Libraries:
- Pandas
- Scikit-learn
- Matplotlib
- Seaborn
The dataset used in this project was sourced from Kaggle. It provides detailed information about video games, including their platform, genre, release year, and sales data across various regions.
The dataset contains the following columns:
Column Name | Description |
---|---|
Name |
Name of the video game. |
Platform |
Platform on which the game was released (e.g., PS4, Xbox, PC). |
Year_of_Release |
Year the game was released. |
Genre |
Genre of the video game (e.g., Action, Sports, Strategy). |
Publisher |
Publisher of the game. |
NA_Sales |
Sales in North America (in millions of units). |
EU_Sales |
Sales in Europe (in millions of units). |
JP_Sales |
Sales in Japan (in millions of units). |
Other_Sales |
Sales in other regions (in millions of units). |
Global_Sales |
Total global sales (in millions of units). |
-
Sales Data:
TheGlobal_Sales
column serves as the target variable for this project. Regional sales (NA, EU, JP, Other) are analyzed to identify their contributions to global sales. -
Categorical Features:
Features likePlatform
andGenre
were encoded to make them usable for machine learning models. -
Missing Data:
Missing values inYear_of_Release
andPublisher
were carefully handled during preprocessing to maintain data integrity. -
Temporal Trends:
The dataset includes data spanning several years, allowing for the analysis of trends over time.
The dataset can be accessed on Kaggle: Video Game Sales Dataset.
For questions or suggestions, feel free to reach out:
GitHub Profile
🌟 If you found this project helpful, please give it a ⭐ on GitHub!
- Clone the repository:
git clone https://github.com/usk2003/VideoGamesSalesPrediction.git cd VideoGamesSalesPrediction
- Run the script in Jupyter Notebook or in Google Colab.