Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constrain dev pandas version #2518

Merged

Conversation

ADBond
Copy link
Contributor

@ADBond ADBond commented Nov 21, 2024

In #2514 I updated the lockfile to get latest package versions, and introduced some minor compatibility fixes. However, something in the newer versions of packages causes problems for pythons >= 3.10. Specifically, poetry's resolver ends up giving us:

  • numpy == 2.1.3
  • pandas == 2.0.3

Together this leads to ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject.

From what I gather, this is due to a change from numpy == 2.0.0. The first version of pandas compatible with numpys of this major version is pandas 2.2.2. However poetry's resolver is not aware of this fact, as pandas only has lower version constraints (for some versions of pandas). For instance pandas == 2.0.3 does not cap the numpy version, and so as far as the resolver is concerned the above pair of packages are perfectly happy together.

There is no way (afaict) to specify dependent constraints between two packages to restrict this sort of thing. It is also worth noting that with a standard pip install this doesn't seem to be an issue - possibly this is occurring due to the combination of packages we have in dev. At any rate, I am reluctant to include any additional hard constraints on our requirements, as this would be unnecessarily limiting for users, as there are no actual direct breaks for us. Anyone downstream encountering this issue in their own environment can just include appropriate constraints themselves.

Therefore I think the best approach is to introduce a hard constraint in the dev dependencies only. That means for devs + in CI, for pandas we have a minimum version of 2.2.2 whenever python is >=3.10. This means we should once again get a valid set of dependencies, but we don't have to impact users.

to deal with getting inconsistent pandas + numpy versions from dependency resolver for python >= 3.10
@ADBond ADBond added dependencies Pull requests that update a dependency file maintenance labels Nov 21, 2024
@ADBond ADBond merged commit af18a92 into moj-analytical-services:maint/deps Nov 21, 2024
@ADBond ADBond deleted the maint/bump-dev-versions branch November 21, 2024 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file maintenance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant