This repository contains the code and resources for the SA20 Data Analytics project, which aims to analyze data from the SA20 cricket league and predict the best XI (eleven) players for the South African (SA) cricket team. The project utilizes various data analytics techniques, including web scraping, data cleaning, transformation, modeling, and machine learning.
- Web scraping of player data, match details, scorecards, and batting/bowling statistics from ESPN Cricinfo using Scrapy and Selenium.
- Data cleaning and transformation using Pandas to ensure data consistency, handle missing values, remove duplicates, and create meaningful features.
- MVP (Most Valuable Player) score formula development using linear regression from scikit-learn, considering key performance metrics for player ranking.
- Power Query in Power BI for advanced data transformation and shaping.
- DAX (Data Analysis Expressions) for data modeling, creating calculated columns, measures, and implementing advanced calculations.
- Building an interactive dashboard in Power BI to visualize the SA20 league data and predict the best XI for the SA team.
- For web scraping, the SA20 league data from ESPN Cricinfo is given as CSV.
- Use the Jupyter notebooks to clean and transform the scraped data using Pandas.
- Use the Jupyter notebooks to develop the MVP score formula using linear regression.
- Utilize the resources sa20.pbix to create an interactive dashboard in Power BI.
We would like to express our gratitude to ESPN Cricinfo for providing the valuable data used in this project. Special thanks to the open-source community for the tools and libraries used in this project.