Skip to content

RobbeNooyens/data-mining

Repository files navigation

Data Mining

Project Structure

The repository is divided into three main sections:

  1. Classification - Predicting income levels using various machine learning models while ensuring fairness and high predictive accuracy.
  2. Clustering - Exploring clustering techniques on a dataset of 2500 news articles categorize the articles based on their content.
  3. Pattern Mining - Analyzing patterns in a dataset concerning income levels to uncover socio-economic characteristics influencing male and female working conditions.

Each section includes a detailed report as a PDF, source code in Python, and datasets used for the analyses.

Reports

1. Classification

  • Objective: Evaluate and compare different machine learning models in terms of accuracy and fairness.
  • Key Models Used: Decision trees, KNN, random forest, and ensemble methods.
  • Main Findings: Identification of the best model that balances fairness with predictive accuracy.

View the Classification Report

2. Clustering

  • Objective: Apply clustering methods to categorize news articles into distinct groups.
  • Techniques Used: KMeans, DBSCAN, and various dimensionality reduction methods.
  • Main Findings: Effective categorization of articles into coherent groups that reflect their content.

View the Clustering Report

3. Pattern Mining

  • Objective: Identify patterns that distinguish between different socio-economic groups.
  • Approach: Data preprocessing and analysis using pattern mining techniques.
  • Main Findings: Insights into the socio-economic characteristics that differentiate male and female working conditions.

View the Pattern Mining Report

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages