Model Card

For additional information, refer to the Model Card paper: https://arxiv.org/pdf/1810.03993.pdf.

Model Details

The model used is an XGBoost, with a max_depth hyperparameter of 6, and an n_estimators hyperparameter of 30. These set of hyperparemeters were chosen as they were the best performing combination, from a range of other values, trained with the use of a stratified 5-fold validation on the training dataset.

More information on the model building process can be found in the model building notebook.

For more information on the algorithm used, refer to the XGBoost documentation.

Intended Use

The objective of the model is to predict whether a person is earning more or less than $50 000.

Training Data

The training data used is the publicly available Census Bureau dataset, obtained from the UCL ML Repository. More information on the dataset can be found on the dataset page.

An EDA of the dataset can be found in the EDA notebook.

Evaluation Data

80% of the data was used for training purposes, whereas 20% was set aside to evaluate the trained models.

Metrics

The overall (test) metrics of the XGBoost model are as follows:

Accuracy: 0.872
Precision: 0.657
Recall: 0.751
F1-Score: 0.701

Performance on specific data-slices can be found here.

Ethical Considerations

Given that the data contains attributes such as occupation, workclass, sex, race, etc., consideration on how the model performs accross different subgroups must be given.

Our assessment of the model's fairness across different population subgroups, using Aequitas, can be found in the model bias & fairness notebook.

For more information on model fairness and Aequitas, refer to the Aequitas documentation.

Caveats and Recommendations

Caveats:

Limited feature engineering and hyperparameter tuning was utilized in the model building process.
The model shows some disparities, particulary related to false positive and negative rates, on different occupation, workclass, education, and marital-status subgroups, raising concerns on its fairness.

Recommendations:

A more thorough hyperparameter tuning, as well as feature engineering & EDA process might lead to a better-performing model.
Slice-based learning could be considered to achieve better model fairness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xgboost_model_card.md

xgboost_model_card.md

Model Card

Model Details

Intended Use

Training Data

Evaluation Data

Metrics

Ethical Considerations

Caveats and Recommendations

Files

xgboost_model_card.md

Latest commit

History

xgboost_model_card.md

File metadata and controls

Model Card

Model Details

Intended Use

Training Data

Evaluation Data

Metrics

Ethical Considerations

Caveats and Recommendations