dataiku-exercise_Jan2025

This repo contains code and presentation on the take home exercise for the Data Scientist position at Dataiku. Appropriate directories have been created to find relevant pieces of information for modeling purposes such as data/processed which contains a variety of pickel files.

The general architecture of this repo is designed to run the following notebooks in sequence: code-1 (EDA), code-2 (processing the data), code-3 (modeling for XGBoost and LightGBM), and code-4 (inference on test data). Please note that I used Kaggle's free cloud infrastructure for XGBoost modeling and some additional inference.

Both XGBoost and LightGBM were optimized on its F1-score. However, the XGboost models produced higher precision and accuracy, while LightGBM models produced higher recall. In summary, there are models that can be leveraged for different business use-cases. A Majority voting across each of the models produced results comparable to the XGBoost, and a simple weighted model aggreation was explored to determine the impact if a slight bias towards recall was implemented.

Overlapping identified characteristics driving the models performance was observed: industry occupation, type of worker, age, sex (male, female), additional networh (cpatial gains/osses, stocks), # of weeks worked per year, and company size.

Python 3.8 was used. See requirements.txt file for additional set up information.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
__pycache__		__pycache__
data		data
models		models
notebooks		notebooks
predictions		predictions
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dataiku-exercise_Jan2025

About

Releases

Packages

Languages

Sean-T-Buchanan/dataiku-exercise_Jan2025

Folders and files

Latest commit

History

Repository files navigation

dataiku-exercise_Jan2025

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages