[Core] introduce videoprocessor. #7776

sayakpaul · 2024-04-25T08:59:01Z

What does this PR do?

Introduces a VideoProcessor akin to VaeImageProcessor to encapsulate the logic of dealing with videos.

TODOs

Add tests
Docs

HuggingFaceDocBuilderDev · 2024-04-25T09:09:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/diffusers/video_processor.py

bghira · 2024-04-25T12:49:07Z

src/diffusers/video_processor.py

+    """Simple video processor."""
+
+    @staticmethod
+    def tensor2vid(video: torch.Tensor, processor: "VaeImageProcessor", output_type: str = "np"):


can you have the methods like frames2gif or frames2mpeg in here as well

@DN6 what do you think?

oops, i meant #7548

I think we can add this functionality to the export functions we have. Generally the pipelines always return an np array, a torch tensor or PIL image/ PIL image list.

I don't think so. Here's an example where tensor2vid is being used when returning the output:

diffusers/src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py

Line 565 in 49b959b

frames = tensor2vid(frames, self.image_processor, output_type=output_type)

src/diffusers/video_processor.py

src/diffusers/pipelines/animatediff/pipeline_animatediff.py

yiyixuxu · 2024-04-29T18:40:49Z

src/diffusers/video_processor.py

+
+    def preprocess_video(self, video: List[Union[PIL.Image.Image, np.ndarray, torch.Tensor]]) -> torch.FloatTensor:
+        """Preprocesses input video(s)."""
+        supported_formats = (np.ndarray, torch.Tensor, PIL.Image.Image)


let's first decide what should be video input format we accept here, I think

list of list of images (or a list of images, in that case, we expand it as list of list of images)

list of list of 4d tensors (or a list of 4d tensor, in this case, we expand)

list of 5d tensors (or 5d tensor, we expand)

list of list of 4d numpy arrays (or a list of 4d array, we expand)

list of 5d numpy arrays (or 5d numpy array, we expand)

cc @DN6 here, anything we would add or remove?

cc @a-r-r-o-w too since you worked on a lot of video pipelines

Okay, so I think all of those are covered. supported_formats here refer to the core base format which can be used to create lists, 4d or 5d tensors, etc.

Let me provide concrete lines of code that I think ensure all the formats listed above are supported:

diffusers/src/diffusers/video_processor.py

Line 58 in 7561c1e

if isinstance(video, list) and isinstance(video[0], list) and isinstance(video[0][0], PIL.Image.Image):

and

diffusers/src/diffusers/video_processor.py

Line 70 in 7561c1e

# In case the video is a list of PIL images, convert to a list of ndarrays.

and 3.

diffusers/src/diffusers/video_processor.py

Line 80 in 459843a

video = torch.cat(video, axis=0) if video[0].ndim == 5 else torch.stack(video, axis=0)

and 5.

diffusers/src/diffusers/video_processor.py

Line 64 in 459843a

video = np.concatenate(video, axis=0) if video[0].ndim == 5 else np.stack(video, axis=0)

Thanks for looping me in. I think that list[list[image]], list[fchw_tensor] and bfchw_tensor are most common to end up with as a user, and all cases you mentioned seem to be handled well and the code makes sense to me.

A small explanation of expected inputs to video processor in the docstrings, order of tensor dims, and some more documentation, like in image processor, would be helpful for newer users imo.

Thank you, @a-r-r-o-w! Added a bit of documentation and type annotation to hopefully make it clearer. LMK your thoughts.

a lot of the work here is overlapping with preprocess though

once we make sure the inputs are accepted as one of below (and expanded)

list of list of images (or a list of images, in that case, we expand it as list of list of images)

list of list of 4d tensors (or a list of 4d tensor, in this case, we expand)

list of 5d tensors (or 5d tensor, we expand)

list of list of 4d numpy arrays (or a list of 4d array, we expand)

list of 5d numpy arrays (or 5d numpy array, we expand)

can we try:

video = [self.image_processor.preprocess(vid) for vid in videos]

after that we check if all the tensors has same shape in the list and throw an error if not

a lot of the work here is overlapping with preprocess though

Help me understand this a bit better? From what I understand, the current preprocess_video() is first checking if we have the inputs in the accepted format and is performing the expansion if needed. And then it's passing off the video to preprocess() like you suggested:

diffusers/src/diffusers/video_processor.py

Line 125 in c5d22e6

video = torch.stack([self.preprocess(f) for f in video], dim=0)

The only overlap I see is this:

diffusers/src/diffusers/video_processor.py

Line 106 in c5d22e6

video = np.array(video).astype(np.float32) / 255.0

But I see this more as a safeguard rather than an overlap.

i think something like this should work (made up code, but roughly the logic)

if isinstance(video, supported_formats): video = [video] if isinstance(video[0], PIL.Image.Image)or if isinstance(video[0], ( np.ndarray, torch.tensor) and video[0].ndim ==4): video = [video] video = torch.stack([self.preprocess(f) for f in video],..) video = video.permute(0, 2, 1, 3, 4)

How about let's first add tests like this https://github.com/huggingface/diffusers/blob/main/tests/others/test_image_processor.py
once we have the test, I can help look into refactoring this function?

Done. LMK what you think. I had to add a bit of code to deal with 5D stuff. But rest has been simplified a lot IMO.

@yiyixuxu

a-r-r-o-w

Looks beautiful to me seeing the multiple copies go away! Just one minor fix with naming for consistency

src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py

sayakpaul · 2024-05-07T06:38:23Z

The failing test is completely unexpected. Need to look deeper. @yiyixuxu WDYT about the preprocess_video refactor btw?

DN6 · 2024-05-07T06:55:52Z

src/diffusers/video_processor.py

-        elif isinstance(video, list) and isinstance(video[0], PIL.Image.Image):
+        if isinstance(video, list) and isinstance(video[0], np.ndarray) and video[0].ndim == 5:
+            warnings.warn(
+                "Passing `video` as a list of 5d np.ndarray is deprecated."


Technically not deprecated since we didn't support it before right? I think we just raise an error here.

it was kinda supported before

diffusers/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth_img2img.py

Line 119 in 0d23645

def preprocess_video(video):

this code, if you pass a list of 5d tensors, it would work; I think it is because of the way the code was written, here

diffusers/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth_img2img.py

Line 123 in 0d23645

video = [video]

the first thing it does is to check if it is a tensor, and if so add it to a list; so in order to support a 5d tensor, it has to support a list of 5d tensor as well. This is same situation with image processor as well

DN6

LGTM 👍🏽

src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py

yiyixuxu · 2024-05-07T16:49:14Z

src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py

+        if video and not isinstance(video[0], list):
+            video = [video]


Suggested change

if video and not isinstance(video[0], list):

video = [video]

But why do we want to do this? I don't think a single-frame video would get represented properly otherwise. We don't have any special treatment for that from preprocess_video either.

why do they need video-to-video pipeline if it is single-frame? i.e. an image?

I don’t know honestly. I tried to follow the original implementation (i.e., the implementation before the refactor) as faithfully as possible.

From the documentation though, it seems like it supports multi-frame videos too:
https://huggingface.co/docs/diffusers/en/api/pipelines/animatediff

So, my best guess is that it supports both — single-frame video and multi-frame video.

yiyixuxu · 2024-05-07T16:50:45Z

src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py

+        # as a list of images
+        if video and not isinstance(video[0], list):
+            video = [video]
+        if latents is None:


maybe raise an error in check_inputs when both latents and video is not None

Already there:

diffusers/src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py

Line 598 in 35358a2

if video is not None and latents is not None:

src/diffusers/video_processor.py

src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py

sayakpaul · 2024-05-08T21:19:02Z

@yiyixuxu could you look into the failing test? It's likely coming from image processor. But if not, let me know.

yiyixuxu · 2024-05-10T04:42:00Z

could you look into the failing test? It's likely coming from image processor. But if not, let me know.

@sayakpaul can confirm that test failure is due to image processor, I will fix it!

up

sayakpaul · 2024-05-10T09:57:57Z

@yiyixuxu I think I have addressed your comments. LMK if this is good to merge.

yiyixuxu · 2024-05-10T18:27:34Z

i left one comment https://github.com/huggingface/diffusers/pull/7776/files#r1597075529
we should accept same video input format across all our pipelines, I don't think that animatediff need to be different
other than that it looks good to me!

sayakpaul · 2024-05-10T18:30:18Z

we should accept same video input format across all our pipelines, I don't think that animatediff need to be different

Okay. I will delete the block that listifies a single image.

yiyixuxu · 2024-05-10T18:51:33Z

feel free to merge once the tests pass :)

introduce videoprocessor.

ede27ee

sayakpaul requested review from DN6 and yiyixuxu April 25, 2024 08:59

fix quality

b68e94d

bghira reviewed Apr 25, 2024

View reviewed changes

src/diffusers/video_processor.py Outdated Show resolved Hide resolved

bghira reviewed Apr 25, 2024

View reviewed changes

yiyixuxu reviewed Apr 26, 2024

View reviewed changes

src/diffusers/video_processor.py Outdated Show resolved Hide resolved

src/diffusers/video_processor.py Outdated Show resolved Hide resolved

sayakpaul added 7 commits April 27, 2024 08:18

Merge branch 'main' into video-processor

3d8a263

address yiyi's feedback

6680d50

fix preprocess_video call.

b00e3ab

video_processor -> image_processor

12515c4

fix

6f18104

fix more.

2208a06

quality

778577d

sayakpaul requested a review from yiyixuxu April 27, 2024 04:21

DN6 reviewed Apr 29, 2024

View reviewed changes

src/diffusers/pipelines/animatediff/pipeline_animatediff.py Outdated Show resolved Hide resolved

sayakpaul added 3 commits April 29, 2024 13:42

Merge branch 'main' into video-processor

cbc5638

Merge branch 'main' into video-processor

1cdd919

image_processor -> video_processor

cb8f138

sayakpaul requested a review from DN6 April 29, 2024 11:23

Merge branch 'main' into video-processor

6246244

yiyixuxu reviewed Apr 29, 2024

View reviewed changes

sayakpaul added 3 commits April 30, 2024 07:54

Merge branch 'main' into video-processor

6c8f300

Merge branch 'main' into video-processor

459843a

support List[List[PIL.Image.Image]]

7561c1e

a-r-r-o-w requested changes Apr 30, 2024

View reviewed changes

src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py Outdated Show resolved Hide resolved

sayakpaul added 3 commits April 30, 2024 09:53

change to video_processor.

fd4bb8a

Merge branch 'main' into video-processor

50e5498

documentation

96fd13f

empty commit

833d415

DN6 reviewed May 7, 2024

View reviewed changes

DN6 approved these changes May 7, 2024

View reviewed changes

yiyixuxu reviewed May 7, 2024

View reviewed changes

src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py Outdated Show resolved Hide resolved

sayakpaul added 4 commits May 7, 2024 13:56

Merge branch 'main' into video-processor

2fc3528

more refactoring of prepare_latents in animatediff vid2vid

a2168f3

checking documentation

7174f4b

remove documentation for now.

d81791b

yiyixuxu reviewed May 7, 2024

View reviewed changes

sayakpaul added 2 commits May 8, 2024 22:41

Merge branch 'main' into video-processor

2c052ab

fix animatediff sdxl

efeafcd

sayakpaul and others added 2 commits May 9, 2024 07:28

Merge branch 'main' into video-processor

1d30685

Merge branch 'main' into video-processor

92f13ca

yiyixuxu mentioned this pull request May 10, 2024

fix test failure [part of video processor PR] #7905

Merged

yiyixuxu and others added 5 commits May 10, 2024 09:34

fix test failure [part of video processor PR] (#7905)

778d3cb

up

remove preceed_with_frames.

9f0b871

doc

80f8739

fix

f128d23

fix

f23d469

remove video input as a single-frame video.

89cba9f

sayakpaul requested a review from yiyixuxu May 10, 2024 19:02

sayakpaul merged commit 04f4bd5 into main May 10, 2024
17 checks passed

sayakpaul deleted the video-processor branch May 10, 2024 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] introduce videoprocessor. #7776

[Core] introduce videoprocessor. #7776

sayakpaul commented Apr 25, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 25, 2024

bghira Apr 25, 2024

sayakpaul Apr 29, 2024

bghira Apr 29, 2024

DN6 May 6, 2024

sayakpaul May 6, 2024

yiyixuxu Apr 29, 2024 •

edited

Loading

yiyixuxu Apr 29, 2024

sayakpaul Apr 30, 2024

a-r-r-o-w Apr 30, 2024

sayakpaul Apr 30, 2024

yiyixuxu May 1, 2024

sayakpaul May 1, 2024

sayakpaul May 1, 2024

yiyixuxu May 3, 2024

sayakpaul May 3, 2024 •

edited

Loading

a-r-r-o-w left a comment •

edited

Loading

sayakpaul commented May 7, 2024

DN6 May 7, 2024

yiyixuxu May 7, 2024

DN6 left a comment

yiyixuxu May 7, 2024

sayakpaul May 8, 2024

yiyixuxu May 10, 2024

sayakpaul May 10, 2024

yiyixuxu May 7, 2024

sayakpaul May 8, 2024

sayakpaul commented May 8, 2024

yiyixuxu commented May 10, 2024

sayakpaul commented May 10, 2024

yiyixuxu commented May 10, 2024

sayakpaul commented May 10, 2024

yiyixuxu commented May 10, 2024

[Core] introduce videoprocessor. #7776

[Core] introduce videoprocessor. #7776

Conversation

sayakpaul commented Apr 25, 2024 • edited Loading

What does this PR do?

TODOs

HuggingFaceDocBuilderDev commented Apr 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiyixuxu Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul May 3, 2024 • edited Loading

Choose a reason for hiding this comment

a-r-r-o-w left a comment • edited Loading

Choose a reason for hiding this comment

sayakpaul commented May 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DN6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul commented May 8, 2024

yiyixuxu commented May 10, 2024

sayakpaul commented May 10, 2024

yiyixuxu commented May 10, 2024

sayakpaul commented May 10, 2024

yiyixuxu commented May 10, 2024

sayakpaul commented Apr 25, 2024 •

edited

Loading

yiyixuxu Apr 29, 2024 •

edited

Loading

sayakpaul May 3, 2024 •

edited

Loading

a-r-r-o-w left a comment •

edited

Loading