-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PyAPI] Tensor creation from Pillow Image #28984
base: master
Are you sure you want to change the base?
Conversation
7031500
to
206d277
Compare
206d277
to
39c8c2f
Compare
throw py::type_error("Input must be a PIL.Image.Image object"); | ||
} | ||
auto np_array = py::module::import("numpy").attr("array")(image); | ||
py::array array = np_array.cast<py::array>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Genai also reshapes and casts dtype: https://github.com/openvinotoolkit/openvino.genai/blob/b395127b973ac1276c8c028c9889447ebb68ed5b/samples/python/visual_language_chat/visual_language_chat.py#L39
That can be a problem because ov::Tensor
doesn't allow casting. Does numpy preserve Image
's layout and dtype? I see the test verifies that, but the used shape and dtype are kind of mainstream, so that doesn't explain what happens to uncommon shapes and types.
@popovaan, do you remember why reshape()
and astype()
are there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reshape()
is needed here, because pic.getdata()
returns flattened data. For example for 100x200x3 image pic.getdata()
will have shape 20000x3.
astype(np.uint8)
is needed because pic.getdata()
contains int64 values and if we remove the cast, we get error from genai:
RuntimeError: Exception from src/core/src/runtime/tensor.cpp:96:
Exception from src/inference/src/dev/make_tensor.cpp:66:
Tensor data with element type i64, is not representable as pointer to u8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked np.array(pic)
works fine and preserves shapes and type. I don't remember why I used pic.getdata()
, looks like we can replace it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilya-lavrenov, if image reading can be reduced to ov.Tensor(np.array(Image.open(path).convert("RGB")))
, is Tensor creation from Pillow Image is still needed? I'd not add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilya-lavrenov, if image reading can be reduced to
ov.Tensor(np.array(Image.open(path).convert("RGB")))
, is Tensor creation from Pillow Image is still needed? I'd not add it.
up to PR authors.
ov.Tensor(np.array(Image.open(path).convert("RGB")))
People need to know about this np.array
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion for pure user convenience we could have this binding. Constructing a Tensor from PIL image out of the box is more user friendly than a oneliner, in which the image has to be opened, converted to RGB and then cast to np.array
- the user has to know/remember about all of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This binding doesn't remove the need to know about opening and conversion to RGB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, the way I see it is this PR allows users to omit the np.array
casting and makes Python API more convenient and working out of the box, which I think aligns with Python values as a language. I agree it's rather a minor change and we could do without it, but out of the two options, I think it's a change for the better. IMO this PR should be merged.
bc76a79
to
ace90f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for core part (looks like only code style fixes)
Details:
Tickets: