Skip to content

v0.12

Compare
Choose a tag to compare
@Sxela Sxela released this 05 Sep 13:46
· 1 commit to v0.12-AGPL since this release
a9e1058

Sample Video

New stuff:

  • tiled vae
  • controlnet v1.1
  • controlnet multimodel GUI
  • saving controlnet predictions and rec noise

Changelog:

  • add shuffle, ip2p, lineart, lineart anime controlnets
  • add shuffle controlnet sources
  • add tiled vae
  • switch to controlnet v1.1 repo
  • update controlnet model urls and filenames to v1.1 and new naming convention
  • update existing controlnet modes to v1.1
  • add consistency controls to video export cell
  • add rec noise save\load for compatible settings
  • save rec noise cache to recNoiseCache folder in the project dir
  • save controlnet annotations to controlnetDebug folder in the project dir
  • add controlnet multimodel options to GUI (thanks to Gateway#5208)
  • add controlnet v1.1 annotator options to GUI

Fixes:

  • fix controlnet depth_init for cond_video with no preprocess
  • fix colormatch stylized frame not working with frame range
  • tidy up the colab interface a bit
  • fix dependency errors for uformer
  • fix lineart/anime lineart errors
  • fix local install
  • bring together most of the installs (faster install, even faster restart & run all), only tested on colab
  • fix zoe depth model producing black frames with autocast on
  • fix controlnet inpainting model None and cond_video modes
  • fix flow preview not being shown
  • fix prompt schedules not working in some cases
  • fix captions not updating in rec prompt
  • fix control_sd15_hed error when loading pre-v0.12 settings files
  • make torch v2 install optional
  • make installation skippable for consecutive runs
  • fix torch v2 install again :D ty to Aixsponza#0619
  • fix AttributeError: 'Block' object has no attribute 'drop_path' error for depth controlnet

Tiled VAE

Tiled VAE -> use_tiled_vae

Basically, we split the image into tiles to use less VRAM during encoding and decoding the image into and from the latent space. This is the main VRAM bottleneck, as the diffusion itself is well-optimized and is already using less VRAM thanks to xformers. You can now render 1920x1080 on 16Gb colab cards, though this will be really slow :D

Tiled VAE -> num_tiles

By default, the image is split into 4 tiles by setting num_tiles = [2,2], but you can change it to [2,1] or [1,2] for non-square images.

I'd suggest leaving other variables as is.
Has only been tested with the default v1 VAE.

Controlnet v1.1

Add controlnets from https://huggingface.co/lllyasviel/ControlNet-v1-1

control_sd15_hed -> control_sd15_softedge
control_sd15_hed is now control_sd15_softedge (has hed and pidi detectors)
To select a detector, set control_sd15_softedge_detector to 'HED' or 'PIDI'

control_sd15_normal -> control_sd15_normalbae
control_sd15_normal is now control_sd15_normalbae

control_sd15_depth
Now has midas and zoe detetors
To select a detector, set control_sd15_depth_detector to 'Zoe' or 'Midas'

control_sd15_openpose
Now has pose+hands+face mode
To enable, set control_sd15_openpose_hands_face = True

control_sd15_seg
Now has 3 detectors.
To select a detector, set control_sd15_seg_detector to 'Seg_OFCOCO', 'Seg_OFADE20K', or 'Seg_UFADE20K'

Seg_UFADE20K is the default detector from CN v1.0

control_sd15_scribble
Now has 2 detectors.
To select a detector, set control_sd15_scribble_detector to 'HED' or 'PIDI'
Predicts much wider strokes compared to control_sd15_softedge mode.

control_sd15_lineart_anime, control_sd15_lineart
New modes, for creating images from lineart. Has coarse prediction mode:
To enable, set control_sd15_lineart_anime_coarse = True or control_sd15_lineart_coarse = True, respectively.

control_sd15_inpaint
New mode, works like inpainting model (though a bit differently)
Supports a few mask sources: consistency_mask, None, cond_video
To select a mask source, set control_sd15_inpaint_mask_source to 'consistency_mask', 'cond_video' or None.

None will inpaint the whole image. consistency_mask will inpaint inconsistent areas. cond_video will take a mask from cond_video path.

The tests have showed that it tends to overcook real fast when used solo (probably to running the unmasked part of the image through VAE over and over again), but may work well together with other contrlnets.

control_sd15_shuffle
New mode, works like image prompt, but better (as it doesn't mess with the main prompt)
Can be used to set style from a single image with a very simple promtp (haven't tested no prompt, but it may work as well)

Supports a few image sources:
color_video, init, prev_frame, first_frame

To select a source, set control_sd15_shuffle_source to 'color_video', 'init', 'prev_frame', 'first_frame'

color_video: uses color video frames (or single image) as source
init - uses current frame's init as source (stylized+warped with consistency mask and flow_blend opacity)
prev_frame - uses previously stylized frame (stylized, not warped)
first_frame - first stylized frame

You can set the 1st frame source as well. For example, if you need to geet the 1st frame style from your image, and for the consecutive frames you want to use the resulting stylized images.

To select a source, set control_sd15_shuffle_1st_source to 'color_video', 'init', 'None'
color_video: uses color video frames (or single image) as source

init - uses current frame's init as source (raw video frame)

None - skips this controlnet for the 1st frame. For example, if you like the 1st frame you're getting and want to keep its style, but don't want to use an external image as a source.

control_sd15_ip2p
New mode, works like instruct pix2pix. Enable it and write prompts like this:

'replace a man with a woman, make the coat red'

save_controlnet_annotations

You can now save controlnet annotator predictions.
To enable it, set save_controlnet_annotations = True

They will be saved to controlnetDebug folder

reconstructed noise cache

Reconstructed noise will now be cached in the recNoiseCache folder in project dir (usually ./images_out/stable_warpfusion_0.12.2/)

It can be reused if most of the settings haven't been changed (to ensure that the cached noise is compatible with current settings at least to some extent)

You can edit: prompt, neg prompt, cfg scale, steps, style strength.
Changing other settings will recreate the noise from scratch.

Controlnet Multimodel GUI

Controlnet Multimodel GUI is now available at controlnet tab. You can disable models via enable checkbox.
Controlnet multimodel settings can be loaded from settings.

Also, this can be done by setting weight to 0, or settings active range to 0-0.