Better dataframe management #6

mariolpantunes · 2025-02-04T23:31:40Z

Hi, my research team and I have used this library for some evaluations.

One drawback of the current implementation is that it only returns the anonymized columns, breaking the structure of the original DataFrame. For our use case, which explores how different AI/ML models retain their learning capabilities on different values of anonymization, this is troublesome. Our current workaround is to define all features as quasi-identifiers, which generates two main issues:

The dataset is unnaturally damaged; there are groupings for all the columns instead of just the important ones.
The anonymization performance is really bad.

As such, we are submitting a pull request to change this behaviour. Our implementation recreates the DataFrame and only updates the necessary columns.

Please consider adding this to the main code.

mariolpantunes and others added 2 commits February 4, 2025 16:00

Initial work to keep the original dataframe

f4340a1

Improve dataframe management

39d1f28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better dataframe management #6

Better dataframe management #6

mariolpantunes commented Feb 4, 2025

Better dataframe management #6

Are you sure you want to change the base?

Better dataframe management #6

Conversation

mariolpantunes commented Feb 4, 2025