Adding VQGAN Training script #5483

isamu-isozaki · 2023-10-23T01:00:59Z

What does this PR do?

This is a vqgan training script ported from taming-transformers and from lucidrian's muse-maskgit repo here and open-muse. I'm planning to test this on the cifar10 dataset to confirm it works

Some steps missing/need confirmation are

Confirm einops and timm can be external dependencies. If not convert these ops to native pytorch
Test on cifar10
Add in test to test_models_vq and test_models_vae for the slight modification

Fixes #4702

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

isamu-isozaki · 2023-10-23T01:03:02Z

Once confirmed it works with cifar10 will remove the draft part

isamu-isozaki · 2023-10-30T02:29:07Z

I was able to start training this script. And I removed the einops dependencies. The only additional dependency so far is timm. I plan to run this overnight on cifar with 128 image resolution and then remove the draft from this pr. Also let me know if anyone knows a good VQModel config that's easy to train/fast

isamu-isozaki · 2023-10-31T13:38:39Z

Ok! Training seems to work. Here's a wandb run on cifar 10. In 6gb vram, command to run this is

accelerate launch train_vqgan.py --dataset_name=cifar10 --image_column=img --validation_images images/bird.jpg images/car.jpg images/dog.jpg images/frog.jpg images/horse.jpg images/ship.jpg --resolution=128 --train_batch_size=2 --gradient_accumulation_steps=8 --report_to=wandb

For the validation images, they will be shown like so for each validation image provided. The left is the input image and the right is the generated image

The remaining parts that I can think of are

Make log_validation support trackers other than wandb
Make tqdm updates similar to other examples

I did find a bug where global step doesn't seem to go above 3000 but once that is fixed I'll open for review

…to vqgan

HuggingFaceDocBuilderDev · 2023-10-31T14:35:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

isamu-isozaki · 2023-10-31T15:03:42Z

The main logic is done so I think it's ready for review. For the 3000 step bug I'm currently running training to see if it happens again after the fixes.

isamu-isozaki · 2023-11-01T01:10:45Z

Ok! Seems like it was a hardware issue(I think). Got steps 3100. Script should be ready for review.

github-actions · 2023-11-26T15:05:35Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yqy2001 · 2024-02-24T03:27:52Z

Hi there, what is the current status of this PR? It seems that everything works well. Will this be merged?

…to vqgan

isamu-isozaki · 2024-04-29T17:18:37Z

@sayakpaul I tried fixing by following the @require_torch format by making a @require_timm. Let me know what you think

yiyixuxu · 2024-04-29T23:35:42Z

src/diffusers/models/vq_model.py

-            return (dec,)
-
-        return DecoderOutput(sample=dec)
+            if return_loss:


maybe we don't need this special return_loss flag
I don't think it would break, no?
it is a tuple, we would usually use out[0], if it is a DecoderOutput, usually we do out.sample; I think just adding the loss to the output should be fine cc @DN6 to confirm if it's non-breaking

Hmm, wouldn't it be possible for some people to do out[-1] for tuple? I think that's the only time it'll break

Even if they do that, I think the error message would be fairly easy to digest but I don't think it will be breaking. WDYT? I like the idea of not introducing return_loss.

Good point. I'll remove return_loss

sayakpaul · 2024-04-30T02:52:44Z

@isamu-isozaki I pushed a few things and I hope you don't mind.

Moved is_timm_available and require_timm to proper modules.
Added timm as a dependency in our workflows.
Decorated the trainer test class with require_timm instead of doing it per test methods.

isamu-isozaki · 2024-04-30T02:56:29Z

@sayakpaul No worries thanks a bunch for doing that. I did forget the proper way to add modules for the tests 😅

sayakpaul · 2024-04-30T03:04:14Z

Ah all tests passing. Sight to the sore eyes, eh!

isamu-isozaki · 2024-04-30T03:23:10Z

@sayakpaul awesome! I just removed return_loss(and hopefully tests still pass). I did do the tests on my end

src/diffusers/models/vq_model.py

tests/models/autoencoders/test_models_vq.py

sayakpaul · 2024-05-15T02:05:26Z

@isamu-isozaki could you resolve the conflicts so that it's ready for merging? We would like to include in our upcoming release. Sorry for the delay on my end.

@yiyixuxu could give the changes introduced to the library components a bit?

isamu-isozaki · 2024-05-15T02:23:30Z

@sayakpaul tnx I think I resolved the conflicts but let me do tests to make sure

sayakpaul · 2024-05-15T02:40:58Z

Okay the code quality issues should be easy to fix I think. But LMK if you find difficulties. What I would do:

Create a fresh Python env.
From diffusers root, run pip install -e .[quality].
Run make style && make quality.
Push the changes.

isamu-isozaki · 2024-05-15T03:03:31Z

@sayakpaul tnx a bunch. I think I fixed the ruff format error but one question I have is when I try locally the doc-builder always fails even in a fresh environment with the above steps. But when I just fix all the tests before that, the checks in the ci usually passes. It might be a bug on my part but is that a common issue?
The error is

ruff check examples scripts src tests utils benchmarks setup.py
ruff format --check examples scripts src tests utils benchmarks setup.py
918 files left unchanged
doc-builder style src/diffusers docs/source --max_len 119 --check_only
Traceback (most recent call last):
  File "/home/isamu/miniconda3/envs/diffusers/bin/doc-builder", line 8, in <module>
    sys.exit(main())
  File "/home/isamu/miniconda3/envs/diffusers/lib/python3.10/site-packages/doc_builder/commands/doc_builder_cli.py", line 47, in main
    args.func(args)
  File "/home/isamu/miniconda3/envs/diffusers/lib/python3.10/site-packages/doc_builder/commands/style.py", line 28, in style_command
    raise ValueError(f"{len(changed)} files should be restyled!")
ValueError: 284 files should be restyled!
Makefile:43: recipe for target 'quality' failed
make: *** [quality] Error 1

locally but I think it'll pass here

sayakpaul · 2024-05-15T03:13:43Z

That's weird. Could be a setup related problem :/

sayakpaul · 2024-05-15T03:17:10Z

Alright merging this now!

sayakpaul · 2024-05-15T03:17:31Z

Thanks a lot for shipping this super cool script, @isamu-isozaki. Really appreciate your hard work and patience!

isamu-isozaki · 2024-05-15T03:24:56Z

@sayakpaul np! No worries at all and thanks for the support!

diffusers commit d27e996 Adding VQGAN Training script huggingface/diffusers#5483

Init commit

cd86f42

isamu-isozaki marked this pull request as draft October 23, 2023 01:02

isamu-isozaki added 8 commits October 29, 2023 17:43

Removed einops

c0e44b0

Added default movq config for training

ccaf393

Update explanation of prompts

4b361cc

Fixed inheritance of discriminator and init_tracker

a726069

Fixed incompatible api between muse and here

0b0cea3

Fixed output

68be3c5

Setup init training

3072e5d

Basic structure done

a302201

isamu-isozaki added 2 commits October 31, 2023 22:26

Removed attention for quick tests

388f880

Style fixes

fca82c5

isamu-isozaki added 5 commits October 31, 2023 09:40

Fixed vae/vqgan styles

1924fab

Removed redefinition of wandb

5637444

Fixed log_validation and tqdm

2f5421d

Nothing commit

e318ca8

Merge branch 'vqgan' of https://github.com/isamu-isozaki/diffusers in…

c69bff6

…to vqgan

isamu-isozaki marked this pull request as ready for review October 31, 2023 15:03

isamu-isozaki changed the title ~~WIP: Adding VQGAN Training script~~ Adding VQGAN Training script Nov 1, 2023

patrickvonplaten requested a review from patil-suraj November 1, 2023 20:42

github-actions bot added the stale Issues that haven't received updates label Nov 26, 2023

github-actions bot closed this Dec 26, 2023

Merge branch 'vqgan' of https://github.com/isamu-isozaki/diffusers in…

d04733c

…to vqgan

Merge branch 'main' into vqgan

cc9a3e7

yiyixuxu reviewed Apr 29, 2024

View reviewed changes

sayakpaul added 3 commits April 30, 2024 07:59

Merge branch 'main' into vqgan

d481d1f

get testing suite ready.

1149dfb

Merge branch 'main' into vqgan

b219a1a

remove return loss

d705ed4

remove return_loss

75d36b6

sayakpaul reviewed Apr 30, 2024

View reviewed changes

src/diffusers/models/vq_model.py Outdated Show resolved Hide resolved

sayakpaul reviewed Apr 30, 2024

View reviewed changes

tests/models/autoencoders/test_models_vq.py Outdated Show resolved Hide resolved

isamu-isozaki and others added 3 commits April 29, 2024 23:53

Remove diffs

6e3ef01

Remove diffs

adbed45

Merge branch 'main' into vqgan

d19c78e

Merge branch 'main' into vqgan

9ebae82

fix ruff format

9f46121

sayakpaul merged commit d27e996 into huggingface:main May 15, 2024
15 checks passed

XSE42 added a commit to XSE42/diffusers3d that referenced this pull request Jun 23, 2024

[Sync] diffusers commit d27e996

28f8d55

diffusers commit d27e996 Adding VQGAN Training script huggingface/diffusers#5483

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding VQGAN Training script #5483

Adding VQGAN Training script #5483

isamu-isozaki commented Oct 23, 2023 •

edited

Loading

isamu-isozaki commented Oct 23, 2023

isamu-isozaki commented Oct 30, 2023 •

edited

Loading

isamu-isozaki commented Oct 31, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 31, 2023

isamu-isozaki commented Oct 31, 2023

isamu-isozaki commented Nov 1, 2023 •

edited

Loading

github-actions bot commented Nov 26, 2023

yqy2001 commented Feb 24, 2024

isamu-isozaki commented Apr 29, 2024

yiyixuxu Apr 29, 2024

isamu-isozaki Apr 29, 2024

sayakpaul Apr 30, 2024

isamu-isozaki Apr 30, 2024

sayakpaul commented Apr 30, 2024

isamu-isozaki commented Apr 30, 2024

sayakpaul commented Apr 30, 2024

isamu-isozaki commented Apr 30, 2024 •

edited

Loading

sayakpaul commented May 15, 2024

isamu-isozaki commented May 15, 2024

sayakpaul commented May 15, 2024

isamu-isozaki commented May 15, 2024 •

edited

Loading

sayakpaul commented May 15, 2024

sayakpaul commented May 15, 2024

sayakpaul commented May 15, 2024

isamu-isozaki commented May 15, 2024

Adding VQGAN Training script #5483

Adding VQGAN Training script #5483

Conversation

isamu-isozaki commented Oct 23, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

isamu-isozaki commented Oct 23, 2023

isamu-isozaki commented Oct 30, 2023 • edited Loading

isamu-isozaki commented Oct 31, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Oct 31, 2023

isamu-isozaki commented Oct 31, 2023

isamu-isozaki commented Nov 1, 2023 • edited Loading

github-actions bot commented Nov 26, 2023

yqy2001 commented Feb 24, 2024

isamu-isozaki commented Apr 29, 2024

yiyixuxu Apr 29, 2024

Choose a reason for hiding this comment

isamu-isozaki Apr 29, 2024

Choose a reason for hiding this comment

sayakpaul Apr 30, 2024

Choose a reason for hiding this comment

isamu-isozaki Apr 30, 2024

Choose a reason for hiding this comment

sayakpaul commented Apr 30, 2024

isamu-isozaki commented Apr 30, 2024

sayakpaul commented Apr 30, 2024

isamu-isozaki commented Apr 30, 2024 • edited Loading

sayakpaul commented May 15, 2024

isamu-isozaki commented May 15, 2024

sayakpaul commented May 15, 2024

isamu-isozaki commented May 15, 2024 • edited Loading

sayakpaul commented May 15, 2024

sayakpaul commented May 15, 2024

sayakpaul commented May 15, 2024

isamu-isozaki commented May 15, 2024

isamu-isozaki commented Oct 23, 2023 •

edited

Loading

isamu-isozaki commented Oct 30, 2023 •

edited

Loading

isamu-isozaki commented Oct 31, 2023 •

edited

Loading

isamu-isozaki commented Nov 1, 2023 •

edited

Loading

isamu-isozaki commented Apr 30, 2024 •

edited

Loading

isamu-isozaki commented May 15, 2024 •

edited

Loading