Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple warning about incompatible dtype casting, probably related to nan check #162

Closed
AndhikaWB opened this issue Jun 3, 2024 · 6 comments · Fixed by #163
Closed
Labels
dependency issue An issue caused directly by the library's dependences

Comments

@AndhikaWB
Copy link

Version check:

Python 3.11.5, dython 0.7.5

Describe the bug:

  • Multiple FutureWarning about incompatible dtype casting when using mixed association matrix
  • The categorical/nominal columns on this dataset are already label encoded (as integer), but even when using astype('category') this warning still appear, I think your script is assuming something wrong about the data type, hence the warning appear
  • There is no nan/null value on this dataset, and this warning can be annoying since I have many columns. Disabling warning may work, but I don't want to do that

Code to reproduce:

import dython
import pandas as pd

df = pd.read_csv('data/data.csv', sep = ';')

temp = df.replace({'Status': {
    'Dropout': 0,
    'Enrolled': 1,
    'Graduate': 2
}})

cat_col = ['Application_mode', 'Course', 'Fathers_occupation', 'Fathers_qualification', 'Marital_status', 'Mothers_occupation', 'Mothers_qualification', 'Nacionality', 'Previous_qualification']

temp = associations(
    temp,
    nominal_columns = cat_col, # + ['Status']
    nom_nom_assoc = 'theil',
    compute_only = True
)['corr']

temp

Error message:

Error message:

c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:737: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[j], columns[i]] = _inf_nan_str(ji)
c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:736: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[i], columns[j]] = _inf_nan_str(ij)
c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:736: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[i], columns[j]] = _inf_nan_str(ij)
c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:736: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[i], columns[j]] = _inf_nan_str(ij)
c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:736: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[i], columns[j]] = _inf_nan_str(ij)
c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:736: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[i], columns[j]] = _inf_nan_str(ij)
c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:736: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[i], columns[j]] = _inf_nan_str(ij)
c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:736: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[i], columns[j]] = _inf_nan_str(ij)
c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:736: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[i], columns[j]] = _inf_nan_str(ij)
c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:736: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[i], columns[j]] = _inf_nan_str(ij)
c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:736: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[i], columns[j]] = _inf_nan_str(ij)
c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:736: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[i], columns[j]] = _inf_nan_str(ij)
c:\Users\Dhika\Documents\Projects\StudentPerf\.venv\Lib\site-packages\dython\nominal.py:736: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.

... (many more lines with exact same warning)

Input data:

From here

@AndhikaWB AndhikaWB added the bug Something isn't working label Jun 3, 2024
@shakedzy shakedzy added dependency issue An issue caused directly by the library's dependences and removed bug Something isn't working labels Jun 3, 2024
@shakedzy
Copy link
Owner

shakedzy commented Jun 3, 2024

Hey @AndhikaWB , as seen in the logs - the warnings originate from Pandas. Which version do you use?

@AndhikaWB
Copy link
Author

@shakedzy v2.2.2, which I believe is the latest

@shakedzy
Copy link
Owner

shakedzy commented Jun 3, 2024

I see, I'll take a look

@AndhikaWB
Copy link
Author

Here's the real source of the data, in case the context is needed: https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention

The source I gave above only renamed the column name (I think)

@shakedzy
Copy link
Owner

@AndhikaWB there's a fix ready on a branch: https://github.com/shakedzy/dython/tree/162-multiple-warning-about-incompatible-dtype-casting-probably-related-to-nan-check
Care to try and see if this solved the problem?

@shakedzy
Copy link
Owner

Fix merged, version 0.7.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependency issue An issue caused directly by the library's dependences
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants