Image manipulation with text difference. #163

Vbansal21 · 2022-06-23T07:19:11Z

Vbansal21
Jun 23, 2022

First of all, great repository sir!!🤯

I was going through the paper, there was this image manipulation method through text difference.
It went like this:

z_i := original image CLIP embedding

z_t := new text CLIP embedding/ embedding of the text for current image manipulation

z_t0 := orignal image's corresponding text CLIP embedding/ text embedding of the text 'a photo' / empty embedding

z_d := l2_norm(z_t - z_t0) <-> text difference vector

z_new /z_theta := spherical_interpolation(z_i, z_d, theta) {where theta is between (0,0.5)} <-> new image's CLIP embedding vector

What I don't understand is, that the CLIP img and text embedding vectors are supposed to be similar vectors (since trained with cosine similarity), and the difference between text embedding vectors of two similar texts will be somewhat perpendicular to either of the vectors, therefore the text diff vector should be very different from the image embedding, and hence the spherical interpolation shouldn't give any meaningful result.

What am I missing? I am unable to understand why this text difference method works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image manipulation with text difference. #163

{{title}}

Replies: 0 comments

Select a reply

Image manipulation with text difference. #163

Vbansal21 Jun 23, 2022

Replies: 0 comments

Vbansal21
Jun 23, 2022