The repository is divided into three main sections:
- Classification - Predicting income levels using various machine learning models while ensuring fairness and high predictive accuracy.
- Clustering - Exploring clustering techniques on a dataset of 2500 news articles categorize the articles based on their content.
- Pattern Mining - Analyzing patterns in a dataset concerning income levels to uncover socio-economic characteristics influencing male and female working conditions.
Each section includes a detailed report as a PDF, source code in Python, and datasets used for the analyses.
- Objective: Evaluate and compare different machine learning models in terms of accuracy and fairness.
- Key Models Used: Decision trees, KNN, random forest, and ensemble methods.
- Main Findings: Identification of the best model that balances fairness with predictive accuracy.
View the Classification Report
- Objective: Apply clustering methods to categorize news articles into distinct groups.
- Techniques Used: KMeans, DBSCAN, and various dimensionality reduction methods.
- Main Findings: Effective categorization of articles into coherent groups that reflect their content.
- Objective: Identify patterns that distinguish between different socio-economic groups.
- Approach: Data preprocessing and analysis using pattern mining techniques.
- Main Findings: Insights into the socio-economic characteristics that differentiate male and female working conditions.