-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make PyTensor compatible with numpy 2.0 #1194
Conversation
Codecov ReportAttention: Patch coverage is
❌ Your patch status has failed because the patch coverage (96.77%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1194 +/- ##
==========================================
+ Coverage 82.01% 82.04% +0.03%
==========================================
Files 187 188 +1
Lines 48467 48533 +66
Branches 8669 8675 +6
==========================================
+ Hits 39749 39819 +70
+ Misses 6554 6553 -1
+ Partials 2164 2161 -3
|
Numba is an optional dependency for us so no need to pin, we can just make sure the CI runs with that version for the numba + benchmark tests (they are already separate) |
PS: this is amazing 😍 I'll try to review soon |
We'll need to advertise slower advanced indexing operation on the C backend. I think that's the only regression? |
Yes, it I haven't looked at any benchmarks for this change, so I'm not sure what the impact is. I'm making some changes to unpin numpy and make sure the CI runs with 2.1 for numba, I'll push those today. |
7ad30ee
to
631f397
Compare
@ricardoV94 Okay, unpinned numpy and changed the CI to match. I ran the tests on my fork and they passed. There was some random mypy issue, so I've fixed that. Not sure why it was raised now. |
@brendan-m-murphy Thanks for this pull request! Looking forward to it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, some small comments / questions
8c17e93
to
1f80278
Compare
340f24c
to
e7b728f
Compare
Okay, I think I've addressed everything. (Sorry I didn't realise when I was pushing to my fork that it was pushing here too... but the latest push should pass CI.) I've moved most of the numpy-version conditional code to |
@brendan-m-murphy, just in case you didn't notice there's a merge conflict. |
e7b728f
to
ecfd045
Compare
Thanks, I rebased onto main, should be good now. |
@brendan-m-murphy, looks like there's still an unresolved merge conflict in your copy of Lines 131 to 136 in ecfd045
|
@brendan-m-murphy It seems you resolved all the comments, so we're pretty much ready to merge once those last conflicts get solved!. I will give another pass next week to see if we didn't miss anything else. Repeating myself, but this is an incredible contribution 🙏 |
ecfd045
to
b081c3a
Compare
Good news is all tests are passing! |
sorry team :( |
- replaced np.AxisError with np.exceptions.AxisError - the `numpy.core` submodule has been renamed to `numpy._core` - some parts of `numpy.core` have been moved to `numpy.lib.array_utils` Except for `AxisError`, the updated imports are conditional on the version of numpy, so the imports should work for numpy >= 1.26. The conditional imports have been added to `npy_2_compat.py`, so the imports elsewhere are unconditonal.
- Replace np.cast with np.asarray: in numpy 2.0, `np.cast[new_dtype](arr)` is deprecated. The literal replacement is `np.asarray(arr, dtype=new_dtype)`. - Replace np.sctype2char and np.obj2sctype. Added try/except to handle change in behavior of `np.dtype` - Replace np.find_common_type with np.result_type Further changes to `TensorType`: TensorType.dtype must be a string, so the code has been changed from `self.dtype = np.dtype(dtype).type`, where the right-hand side is of type `np.generic`, to `self.dtype = str(np.dtype(dtype))`, where the right-hand side is a string that satisfies: `self.dtype == str(np.dtype(self.dtype))` This doesn't change the behavior of `np.array(..., dtype=self.dtype)` etc.
Some macros were removed from npy_3k_compat.h. Following numpy, I updated the affected functions to the Python 3 names, and removed support for Python 2. Also updated lazylinker_c version to indicate substantial changes to the C code.
- replace `->elsize` by `PyArray_ITEMSIZE` - don't use deprecated PyArray_MoveInto
Anything `Hashable` should work, but I've made the return type `tuple[Hashable]` to keep with the current style. This means, e.g., we can use strings in the cache version.
This is done using C++ generic functions to get/set the real/imag parts of complex numbers. This gives us an easy way to support Numpy v < 2.0, and allows the type underlying the bit width types, like pytensor_complex128, to be correctly inferred from the numpy complex types they inherit from. Updated pytensor_complex struct to use get/set real/imag aliases defined above. Also updated operators such as `Abs` to use get_real, get_imag. Macros have been added to ensure compatibility with numpy < 2.0 Note: redefining the complex arithmetic here means that we aren't treating NaNs and infinities as carefully as the C99 standard suggets (see Appendix G of the standard). The code has been like this since it was added to Theano, so we're keeping the existing behavior.
MapIter was removed from the public numpy C-API in version 2.0, so we raise a not implemented error to default to the python code for the AdvancedInSubtensor1. The python version, defined in `AdvancedInSubtensor1.perform` calls `np.add.at`, which uses `MapIter` behind the scenes. There is active development on Numpy to improve the efficiency of `np.add.at`. To skip the C implementation and use the Python implementation, we raise a NotImplementedError for this op's c code if numpy>=2.0.
This was done for the python linker and numba linker. deepcopy seems to be the recommended method for copying a numpy Generator. After this numpy PR: numpy/numpy@44ba7ca `copy` didn't seem to actually make an independent copy of the `np.random.Generator` objects spawned by `RandomStream`. This was causing the "test values" computed by e.g. `RandomStream.uniform` to increment the RNG state, which was causing tests that rely on `RandomStream` to fail. Here is some related discussion: numpy/numpy#24086 I didn't see any official documentation about a change in numpy that would make copy stop working.
numpy.random.Generator.__getstate__() now returns none; to see the state of the bit generator, you need to use Generator.bit_generator.state. This change affects `RandomGeneratorType`, and several of the random tests (including some for Jax.)
`np.MAXDIMS` was removed from the public API and no replacement is given in the migration docs. In numpy <= 1.26, the value of `np.MAXDIMS` was 32. This was often used as a flag to mean `axis=None`. In numpy >= 2.0, the maximum number of dims of an array has been increased to 64; simultaneously, a constant `NPY_RAVEL_AXIS` was added to the C-API to indicate that `axis=None`. In most cases, the use of `np.MAXDIMS` to check for `axis=None` can be replaced by the new constant `NPY_RAVEL_AXIS`. To make this constant accessible when using numpy <= 1.26, I added a function to insert `npy_2_compat.h` into the support code for the affected ops.
In numpy 2.0, -1 as uint8 is out of bounds, whereas previously it would be converted to 255. This affected the test helper function `reduced_bitwise_and`. The helper function was changed to use 255 instead of -1 if the dtype was uint8, since this is what is needed to match the behavior of the "bitwise and" op. `reduced_bitwise_and` was only used by `TestCAReduce` in `tests/tensor/test_elemwise.py`, so it was moved there from `tests/tensor/test_math.py`
1. Changed autocaster due to new promotion rules With "weak promotion" of python types in Numpy 2.0, the statement `1.1 == np.asarray(1.1).astype('float32')` is True, whereas in Numpy 1.26, it was false. However, in numpy 1.26, `1.1 == np.asarray([1.1]).astype('float32')` was true, so the scalar behavior and array behavior are the same in Numpy 2.0, while they were different in numpy 1.26. Essentially, in Numpy 2.0, if python floats are used in operations with numpy floats or arrays, then the type of the numpy object will be used (i.e. the python value will be treated as the type of the numpy objects). To preserve the behavior of `NumpyAutocaster` from numpy <= 1.26, I've added an explicit conversion of the value to be converted to a numpy type using `np.asarray` during the check that decides what dtype to cast to. 2. Updates due to new numpy conversion rules for out-of-bounds python ints In numpy 2.0, out of bounds python ints will not be automatically converted, and will raise an `OverflowError` instead. For instance, converting 255 to int8 will raise an error, instead of returning -1. To explicitly force conversion, we must use `np.asarray(value).astype(dtype)`, rather than `np.asarray(value, dtype=dtype)`. The code in `TensorType.filter` has been changed to the new recommended way to downcast, and the error type caught by some tests has been changed to OverflowError from TypeError
I was getting a NameError from the list comprehensions saying that e.g. `pytensor_scalar` was not defined. I'm not sure why, but this is another (more verbose) way to do the same thing.
From numpy PR numpy/numpy#22449, the repr of scalar values has changed, e.g. from "1" to "np.int64(1)", which caused two doctests to fail.
In numpy 2.0, if axis=None, then np.unique does not flatten the inverse indices returned if return_inverse=True A helper function has been added to npy_2_compat.py to mimic the output of `np.unique` from version of numpy before 2.0
Due to changes in numpy conversion rules (NEP 50), overflows are not ignored; in particular, negating a unsigned int causes an overflow error. The test for `neg` has been changed to check that this error is raised.
I split this test up to test uint64 separately, since this is the case discussed in Issue pymc-devs#770. I also added a test for the exact example used in that issue. The uint dtypes with lower precision should pass. The uint64 case started passing for me locally on Mac OSX, but still fails on CI. I'm not sure why this is, but at least the test will be more specific now if it fails in the future.
Also added ruff numpy2 transition rule.
Remaining tests now run on latest numpy, except for Numba jobs, which need numpy 2.1.0
4dcaab3
to
b633bca
Compare
I rebased onto main, it doesn't look like there were any serious conflicts :) |
All green except for codecov. Quick, let's get this in! 😀 |
codecov is never green |
This is a meaty one - congrats! |
Description
This PR makes PyTensor compatible with numpy versions >= 1.26 and < 2.2. (Numba requires numpy < 2.2.)
These changes include:
3
will print, but in numpy >= 2.0,np.int64(3)
will print)numpy.core
is nownumpy._core
, and many functions have been moved fromcore
to new public locations likenumpy.lib
; alsoAxisError
needs to be imported fromnumpy.exceptions
now). These changes are conditional on numpy version number, except theAxisError
import, which is compatible with numpy 1.26.np.cast
is deprecated; its replacement isnp.asarray(..., dtype=...)
np.unique
has changed whenaxis
isNone
; this required changes in theUnique
Op. (This change is conditional on numpy version number.).astype
. Conversions are no longer handled automatically; for instancenp.asarray(-1, dtype="unit8")
will raise anOverflowError
.TensorType.filter
uses this new conversion method ifallow_downcast
is true, which preserves the existing behaviorOverflowError
s (for numpy >= 2.0, orTypeError
for numpy < 2.0), or use equivalent but valid values (e.g. using255
for auint8
, instead of-1
).NumpyAutocaster
has been changes to explicitly convert values to numpy types usingnp.asarray
, which preserves the existing behavior. (The reason this preserves the behavior is that this is how the comparison is done inTensorType.filter
, wherenp.asarray(data)
is compared toconverted_data = np.asarray(data, self.dtype)
.)changed methods of
numpy.random.Generator
that are used bycopy
andpickle
.Generator
with independent state, you must usedeepcopy
now, instead ofcopy
Generator.__getstate__()
toNone
. To get the state now, you must useGenerator.bit_generator.state
.->elsize
byPyArray_ITEMSIZE
ScalarType
. The numpy implementation of complex numbers has been changed from a struct with real and imaginary values to the native C-99 complex types. On disk, these are equivalent, but the real and imaginary parts C-99 complex types cannot be accessed using pointers. Numpy provides some macros to make accessing real and imaginary parts uniform across pre and post 2.0 version. Since these are implemented in terms of the typesnpy_cfloat
,npy_cdouble
,npy_clongdouble
, some generic functions were added to the C code so that we do not need to explicitly translation bit size aliases ilkenpy_complex64
to these types.np.MAXDIMS
was removed from the public API. This value was a common flag used to indicate thataxis=None
has been passed. Now there is an explicitly flagNPY_RAVEL_AXIS
. Implementing this change was a bit variable across the affected code. A compatibility header was added topytensor/npy_2_compat.py
to makeNPY_RAVEL_AXIS
available for numpy < 2.0.MapIter
was removed from the public numpy C-API, so it was not possible to adapt the C-code forAdvancedIncSubtensor1
; instead aNotImplementedError
is raised, so this Op defaults to the python implementation, which usesnp.add.at
.npy_3k_compat.h
): BUILD: clean out py2 stuff from npy_3kcompat.h numpy/numpy#26842Related Issue
Checklist
Type of change
📚 Documentation preview 📚: https://pytensor--1194.org.readthedocs.build/en/1194/