This repository contains the code and documentation for the EAS 509 Statistical Data Mining II Project 1, conducted by Jay Yogesh Thanki, and Rohan Ishwarlal Patel. The project involves a comprehensive analysis of the Exacens dataset using data science techniques, including data cleaning, exploratory data analysis (EDA), and modeling.
-
Jay Yogesh Thanki
- Email: jayyoges@buffalo.edu
- UBID: jayyoges (50496564)
-
Rohan Ishwarlal Patel
- Email: rpatel38@buffalo.edu
- UBID: rpatel38 (50496374)
The repository is organized as follows:
-
Code:
data_cleaning.rmd
: Jupyter Notebook containing the code for data cleaning and preprocessing.eda.rmd
: Jupyter Notebook for exploratory data analysis (EDA) on the Exacens dataset.modeling.rmd
: Jupyter Notebook with code for data modeling using various machine learning algorithms.
-
Data:
exacens_dataset.csv
: The Exacens dataset used for analysis.
-
Images:
- Contains images generated during data visualization and analysis.
-
README.md:
- This file providing an overview of the project, team members, and project structure.
This project presents a thorough analysis of the Exacens dataset, including data cleaning, exploratory data analysis, and data modeling. The team applied various data science techniques to ensure the reliability of the analysis and gain valuable insights from the dataset. The modeling phase involved the experimentation with different machine learning models, with a detailed evaluation of their performance. The selected best-performing model is justified based on its alignment with the dataset characteristics.
- Clone the repository to your local machine.
- Open the Jupyter Notebooks (
data_cleaning.ipynb
,eda.ipynb
,modeling.ipynb
) in a Jupyter Notebook environment. - Execute the cells in each notebook sequentially to replicate the analysis.
Feel free to explore the code, data, and findings presented in the notebooks.
The project contributes to a deeper understanding of the Exacens dataset, showcasing the potential of data science in extracting actionable insights from complex data. The selected model and findings can serve as a reference for future studies in similar domains.