-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding VQGAN Training script #5483
Conversation
Once confirmed it works with cifar10 will remove the draft part |
I was able to start training this script. And I removed the einops dependencies. The only additional dependency so far is timm. I plan to run this overnight on cifar with 128 image resolution and then remove the draft from this pr. Also let me know if anyone knows a good VQModel config that's easy to train/fast |
Ok! Training seems to work. Here's a wandb run on cifar 10. In 6gb vram, command to run this is
For the validation images, they will be shown like so for each validation image provided. The left is the input image and the right is the generated image The remaining parts that I can think of are
I did find a bug where global step doesn't seem to go above 3000 but once that is fixed I'll open for review |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
The main logic is done so I think it's ready for review. For the 3000 step bug I'm currently running training to see if it happens again after the fixes. |
Ok! Seems like it was a hardware issue(I think). Got steps 3100. Script should be ready for review. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi there, what is the current status of this PR? It seems that everything works well. Will this be merged? |
@sayakpaul I tried fixing by following the @require_torch format by making a @require_timm. Let me know what you think |
src/diffusers/models/vq_model.py
Outdated
return (dec,) | ||
|
||
return DecoderOutput(sample=dec) | ||
if return_loss: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we don't need this special return_loss
flag
I don't think it would break, no?
it is a tuple, we would usually use out[0]
, if it is a DecoderOutput
, usually we do out.sample
; I think just adding the loss to the output should be fine cc @DN6 to confirm if it's non-breaking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, wouldn't it be possible for some people to do out[-1] for tuple? I think that's the only time it'll break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if they do that, I think the error message would be fairly easy to digest but I don't think it will be breaking. WDYT? I like the idea of not introducing return_loss
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll remove return_loss
@isamu-isozaki I pushed a few things and I hope you don't mind.
|
@sayakpaul No worries thanks a bunch for doing that. I did forget the proper way to add modules for the tests 😅 |
Ah all tests passing. Sight to the sore eyes, eh! |
@sayakpaul awesome! I just removed return_loss(and hopefully tests still pass). I did do the tests on my end |
@isamu-isozaki could you resolve the conflicts so that it's ready for merging? We would like to include in our upcoming release. Sorry for the delay on my end. @yiyixuxu could give the changes introduced to the library components a bit? |
@sayakpaul tnx I think I resolved the conflicts but let me do tests to make sure |
Okay the code quality issues should be easy to fix I think. But LMK if you find difficulties. What I would do:
|
@sayakpaul tnx a bunch. I think I fixed the ruff format error but one question I have is when I try locally the doc-builder always fails even in a fresh environment with the above steps. But when I just fix all the tests before that, the checks in the ci usually passes. It might be a bug on my part but is that a common issue?
locally but I think it'll pass here |
That's weird. Could be a setup related problem :/ |
Alright merging this now! |
Thanks a lot for shipping this super cool script, @isamu-isozaki. Really appreciate your hard work and patience! |
@sayakpaul np! No worries at all and thanks for the support! |
diffusers commit d27e996 Adding VQGAN Training script huggingface/diffusers#5483
What does this PR do?
This is a vqgan training script ported from taming-transformers and from lucidrian's muse-maskgit repo here and open-muse. I'm planning to test this on the cifar10 dataset to confirm it works
Some steps missing/need confirmation are
Fixes #4702
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.