Releases: Sxela/WarpFusion
v0.14
Changelog:
- allow output size as multiple of 8 (add hack from auto1111 to controlnets)
- add an option to copy audio from the init video to the output video
- extract videoFrames to a folder named after video metadata
- extract flow to a folder named after flow source metadata and resolution
- auto re-create flow on video/resolution change
- add deflicker losses (the effect needs to be tested)
- add gui difficulty settings :D
- fix torch not found message with the installed torch
- safeguard controlnet model dir from empty value
- fix control_inpainting_mask=None mode error
- rename mask_callback to masked_diffusion
- hide extra settings for tiled vae
- set controlnet weight to 1 when turning it on in gui
- replace blend_code and normalize_code for start_code with code_randomness
- save controlnet debug with 6 digit frame numbers
- add safetensors support for vae
- fix markupsafe install error on a local install
- add realesrgan upscaler for video export
- save upscaled video under a different name
- fix flow preview generation out-of-range exception
- fix realesrgan not found error
- fix upscale ratio not being int error
- fix the black screen when using tiled vae (because of divisible by 8)
- fix oom errors during upscaler (offload more, retry failed frame 1 time)
- fix pytorch install
- fix torchmetrics version thx to #tomatoslasher
- fix pillow error
I'd recommend using the default 'divisible by 64' setting because the '8' one is buggy, and even though it won't produce black frames with tiled vae anymore, the tiles may not be consistent across frames.
You may get "Error processing frame" during video with an upscale ratio>1 due to the lack of free VRAM. You may need to lower the thread count to avoid that.
GUI difficulty settings
The GUI now has 3 iconic difficulty settings, affecting the number of visible options.
I'm too young to die. - basic diffusion settings + a little bit of warp
Hey, not too rough. - more warp. The set of settings for most projects
Ultra-Violence. - all available settings.
Auto flow/video frames management
Frames are now extracted to a folder that is named after the video's metadata hash + start_frame, end_frame, extract_nth_frame.
Same thing with the flow, but the flow folder also takes into account the width_height (output resolution)
So you won't need to tingle the force_flow_generation checkbox unless you've interrupted the flow generation before. From now on changing video file, start_frame, end_frame, extract_nth_frame or width_height will recreate flow automatically.
Keep audio
Create video -> Keep_audio
Enable the checkbox to add audio from your init video to the output video. It ill be saved together with the non-audio output.
Upscale
Create video -> Upscale settings
upscale_ratio: 1 - no upscale, >1: upscales the output frame
upscale_model: realesrgan model used for upscale
v0.13
Changelog:
- add alternative consistency algo (also faster)
- auto skip install for our docker env
- clean some discodiffusion legacy code (it's been a year :D)
- add controlnet default main model (v1.5)
- add reference controlnet (attention injection)
- add reference mode and source image
- skip flow preview generation if it fails
- downgrade to torch v1.13 for colab hosted env
- save schedules to settings before applying templates
- keep pre-template settings in the GUI
- add gui options to load settings, keep state on rerun/load from previous cells
- fix schedules not kept on GUI rerun
- rename depth_source to cond_image_src to reflect its actual purpose
- fix outer not defined error for reference
- remove torch downgrade for colab
- remove xformers for torch v2/colab
- add sdp attention from AUTOMATIC1111 to replace xformers (for torch v2)
- fix reference controlnet infinite recursion loop
- fix prompt schedules not working with "0"-like keys
New consistency algorithm
The new algo is cleaner and should reduce missed consistency mask replated flicker
Consistency is now calculated simultaneously with the flow.
use_legacy_cc:
The alternative consistency algo is on by default. To revert to the older algo, check use_legacy_cc in Generate optical flow and consistency maps cell.
missed_consistency_dilation:
Missed consistency mask "width". 1 - default value
edge_consistency_width:
Edge consistency width. Odd numbers only, default = 11
Reference controlnet (aka attention injection)
By Lvmin Zhang
https://github.com/Mikubill/sd-webui-controlnet
Added attention injection. You can mix attention data from your reference image and the one that's being generated. Runs 2x slower as it basically samples 2 images in parallel (the stylized and reference).
Works with any model, not only controlnet multi, as it's just a hack on attention layers. We still call in controlnet to honor its author's naming decision.
Reference controlnet (attention injection) ->
use_reference: Check to enable
reference_weight: strength of reference image vs current image
reference_source: source of the reference image. Options: ['None', 'stylized', 'init', 'prev_frame','color_video']
None - off
stylized - use current input image
prev_frame - previously stylized frame
init - raw video frame
color_video - frame from color video
reference_mode:
Options: ['Balanced', 'Controlnet', 'Prompt']
Defines what should affect the result more, prompt or reference.
v0.12
New stuff:
- tiled vae
- controlnet v1.1
- controlnet multimodel GUI
- saving controlnet predictions and rec noise
Changelog:
- add shuffle, ip2p, lineart, lineart anime controlnets
- add shuffle controlnet sources
- add tiled vae
- switch to controlnet v1.1 repo
- update controlnet model urls and filenames to v1.1 and new naming convention
- update existing controlnet modes to v1.1
- add consistency controls to video export cell
- add rec noise save\load for compatible settings
- save rec noise cache to recNoiseCache folder in the project dir
- save controlnet annotations to controlnetDebug folder in the project dir
- add controlnet multimodel options to GUI (thanks to Gateway#5208)
- add controlnet v1.1 annotator options to GUI
Fixes:
- fix controlnet depth_init for cond_video with no preprocess
- fix colormatch stylized frame not working with frame range
- tidy up the colab interface a bit
- fix dependency errors for uformer
- fix lineart/anime lineart errors
- fix local install
- bring together most of the installs (faster install, even faster restart & run all), only tested on colab
- fix zoe depth model producing black frames with autocast on
- fix controlnet inpainting model None and cond_video modes
- fix flow preview not being shown
- fix prompt schedules not working in some cases
- fix captions not updating in rec prompt
- fix control_sd15_hed error when loading pre-v0.12 settings files
- make torch v2 install optional
- make installation skippable for consecutive runs
- fix torch v2 install again :D ty to Aixsponza#0619
- fix AttributeError: 'Block' object has no attribute 'drop_path' error for depth controlnet
Tiled VAE
Tiled VAE -> use_tiled_vae
Basically, we split the image into tiles to use less VRAM during encoding and decoding the image into and from the latent space. This is the main VRAM bottleneck, as the diffusion itself is well-optimized and is already using less VRAM thanks to xformers. You can now render 1920x1080 on 16Gb colab cards, though this will be really slow :D
Tiled VAE -> num_tiles
By default, the image is split into 4 tiles by setting num_tiles = [2,2], but you can change it to [2,1] or [1,2] for non-square images.
I'd suggest leaving other variables as is.
Has only been tested with the default v1 VAE.
Controlnet v1.1
Add controlnets from https://huggingface.co/lllyasviel/ControlNet-v1-1
control_sd15_hed -> control_sd15_softedge
control_sd15_hed is now control_sd15_softedge (has hed and pidi detectors)
To select a detector, set control_sd15_softedge_detector to 'HED' or 'PIDI'
control_sd15_normal -> control_sd15_normalbae
control_sd15_normal is now control_sd15_normalbae
control_sd15_depth
Now has midas and zoe detetors
To select a detector, set control_sd15_depth_detector to 'Zoe' or 'Midas'
control_sd15_openpose
Now has pose+hands+face mode
To enable, set control_sd15_openpose_hands_face = True
control_sd15_seg
Now has 3 detectors.
To select a detector, set control_sd15_seg_detector to 'Seg_OFCOCO', 'Seg_OFADE20K', or 'Seg_UFADE20K'
Seg_UFADE20K is the default detector from CN v1.0
control_sd15_scribble
Now has 2 detectors.
To select a detector, set control_sd15_scribble_detector to 'HED' or 'PIDI'
Predicts much wider strokes compared to control_sd15_softedge mode.
control_sd15_lineart_anime, control_sd15_lineart
New modes, for creating images from lineart. Has coarse prediction mode:
To enable, set control_sd15_lineart_anime_coarse = True or control_sd15_lineart_coarse = True, respectively.
control_sd15_inpaint
New mode, works like inpainting model (though a bit differently)
Supports a few mask sources: consistency_mask, None, cond_video
To select a mask source, set control_sd15_inpaint_mask_source to 'consistency_mask', 'cond_video' or None.
None will inpaint the whole image. consistency_mask will inpaint inconsistent areas. cond_video will take a mask from cond_video path.
The tests have showed that it tends to overcook real fast when used solo (probably to running the unmasked part of the image through VAE over and over again), but may work well together with other contrlnets.
control_sd15_shuffle
New mode, works like image prompt, but better (as it doesn't mess with the main prompt)
Can be used to set style from a single image with a very simple promtp (haven't tested no prompt, but it may work as well)
Supports a few image sources:
color_video, init, prev_frame, first_frame
To select a source, set control_sd15_shuffle_source to 'color_video', 'init', 'prev_frame', 'first_frame'
color_video
: uses color video frames (or single image) as source
init
- uses current frame's init as source (stylized+warped with consistency mask and flow_blend opacity)
prev_frame
- uses previously stylized frame (stylized, not warped)
first_frame
- first stylized frame
You can set the 1st frame source as well. For example, if you need to geet the 1st frame style from your image, and for the consecutive frames you want to use the resulting stylized images.
To select a source, set control_sd15_shuffle_1st_source to 'color_video', 'init', 'None'
color_video: uses color video frames (or single image) as source
init - uses current frame's init as source (raw video frame)
None - skips this controlnet for the 1st frame. For example, if you like the 1st frame you're getting and want to keep its style, but don't want to use an external image as a source.
control_sd15_ip2p
New mode, works like instruct pix2pix. Enable it and write prompts like this:
'replace a man with a woman, make the coat red'
save_controlnet_annotations
You can now save controlnet annotator predictions.
To enable it, set save_controlnet_annotations = True
They will be saved to controlnetDebug folder
reconstructed noise cache
Reconstructed noise will now be cached in the recNoiseCache folder in project dir (usually ./images_out/stable_warpfusion_0.12.2/)
It can be reused if most of the settings haven't been changed (to ensure that the cached noise is compatible with current settings at least to some extent)
You can edit: prompt, neg prompt, cfg scale, steps, style strength.
Changing other settings will recreate the noise from scratch.
Controlnet Multimodel GUI
Controlnet Multimodel GUI is now available at controlnet tab. You can disable models via enable checkbox.
Controlnet multimodel settings can be loaded from settings.
Also, this can be done by setting weight to 0, or settings active range to 0-0.
v0.11
Changelog:
- add lora support
- add loras schedule parsing from prompts
- add custom path for loras
- add custom path from embeddings
- add torch built-in raft implementation
- fix threads error on video export
- disable guidance for lora
- add compile option for raft @ torch v2 (a100 only)
- force torch downgrade for T4 GPU on colab
- add faces controlnet from https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace
- make gui not reset on run cell (there is still javascript delay before input is saved)
- add custom download folder for controlnets
- fix face controlnet download url
- fix controlnet depth_init for cond_video with no preprocess
Lora Support
Added lora support.
Firstly, download them and place in a folder. When downloading, mind the base model used for loras, v1 and v2 loras may not be compatible.
Specify their folder in LORA & embedding paths cell. After you run this cell, a list of detected loras will be printed.
To use loras, add them to your main prompt (preferrably at the end, it will be detected and removed from the prompt). Use the following format: lora:lora_name:lora_weight
For example:
<lora:urbanSamuraiClothing_urbansamuraiV03:1>
where urbanSamuraiClothing_urbansamuraiV03 is the detected lora name printed in LORA & embedding paths cell.
The full prompt may look like this: {0: ['a beautiful highly detailed cyberpunk mechanical augmented most beautiful (man) ever, cyberpunk 2077, neon, dystopian, hightech, trending on artstation, <lora:urbanSamuraiClothing_urbansamuraiV03:1>']}
Scheduling:
You may schedule loras in your prompts. (requires blend_json_schedules enabled for that, otherwise it will use the last weight value without blending)
For example:
{0: ['a prompt, <lora:urbanSamuraiClothing_urbansamuraiV03:1>'],
100: ['a prompt, <lora:urbanSamuraiClothing_urbansamuraiV03:0>']}
will gradually reduce lora weight from 1 to 0 across 100 frames.
You can use multiple loras. For example:
{0: ['a prompt, <lora:urbanSamuraiClothing_urbansamuraiV03:1> <lora:zahaHadid_v10:0>'],
100: ['a prompt, <lora:urbanSamuraiClothing_urbansamuraiV03:0> <lora:zahaHadid_v10:1>']}
will gradually reduce urbanSamuraiClothing_urbansamuraiV03 weight from 1 to 0 across 100 frames, and increase zahaHadid_v10 weight from 0 to 1.
if you don't specify a 0-weight keframe, the lora will just pop. For example:
{0: ['a prompt'], 100: ['a prompt, <lora:urbanSamuraiClothing_urbansamuraiV03:1>']}
will have no lora for frames 0-99, and urbanSamuraiClothing_urbansamuraiV03 lora with weight 1 at frame 100.
Init scale and latent scale (guidance) are disabled when loras are used.
Faces ControlNet
Added Faces ControlNet support. To use it set weight above 0 in the cell with other controlnets.
Settings: GUI -> controlnet - max_faces: max faces to detect.
If no faces were detected in a frame, the controlnet will not be used for that frame even if it was enabled.
QOL improvements
The GUI will now keep its values even when the cell is re-run. Keep in mind that it's a javascript app and may still lag, not saving the most recent changes if your system if under load.
Added user paths for loras, embeddings, controlnet models. You can now store all of these in a common (non-warp) folder / google drive.
Added built-in raft implementation instead of the pre-compiled one used before. It will support torch v2. To use it, uncheck use_jit_raft. You can compile built-it raft model for 30% speed-up (available only for a100 on google colab / torch v2 on local install)
v0.10
Changelog:
- add predicted noise mode (reconstruction / rec) from this comment AUTOMATIC1111/stable-diffusion-webui#736
- add prompt schedule for rec
- add cfg scale for rec
- add captions support to rec prompt
- add source selector for rec noise
- add v1/v2 support for rec noise
- add single controlnet support for rec noise
- add multi controlnet to rec noise
- add rec steps % option
- add rec noise to gui
- add TemporalNet from https://huggingface.co/CiaraRowles/TemporalNet
- add temporalnet source selector (init/stylized)
- skip temporalnet for 1st frame
- add masked guidance toggle to gui
- add masked diffusion toggle to gui
- add softclamp to gui
- add temporalnet settings to gui
- add controlnet annotator settings to gui
- hide sat_scale (causes black screen)
- hide inpainting model-specific settings
- hide instructpix2pix-scpecific settings
TemporalNet
TemporalNet is a ControlNet, which is intended to be used for stabilizing img2img series by using the previous frame as input. You can use it together with other ControlNets in MultiControlNet mode.
GUI -> Controlnet - temporalnet_source
It uses the previous frame for better temporal stability. You can either use raw init video or stylized frame as input.
GUI -> Controlnet - temporalnet_skip_1st_frame
It will not work during the 1st frame by default, because it relies on the previous frame. You can force it to be enabled during the 1st frame render by disabling this checkbox, it will then use the 1st raw video frame as its image input.
Reconstructed noise
(similar to img2img alternative script from AUTO1111)
GUI -> diffusion -> reconstructed noise
Diffusion models generate (denoise) images from random noise. During the img2img process, we inject our init image somewhere halfway through the denoising, so we need to add some random noise on top of it for the model to reconstruct it correctly.
This introduces temporal inconsistency due to the randomness of the noise we add.
One of the most elegant solutions to this problem is using reconstructed noise instead of a random one. We take our init image, reverse sample it with our model, and get a very specific noise pattern. Given out model and settings this noise can be used to exactly reconstruct the init image. This means, that if we use this noise during our img2img process, the temporal inconsistency should be much lower than with random noise, as it's strongly related to our init image.
use_predicted_noise
: enable the feature. won't work with fixed_code enabled
Rec Prompt:
reconstruction prompt. You need this prompt to describe the scene without the style you're applying in the main prompt. Can use {caption}
For example, if my main prompt is "a beautiful village by salvador dali", the reconstruction prompt should be "a beautiful village".
rec_cfg
- reconstruction cfg_cale. keep low, between 1-1.9
rec_randomness
- add random noise to reconstructed noise. 0 - no random, 1 - full random, no rec noise
rec_source
- image used to reconstruct the noise from.
rec_steps_pct
- % of the current frame's total steps used for reconstruction. 1 - 100%, most accurate/slowest. 0.5 is okay for starters.
GUI updates
Model-specific settings are now only shown when using the respective model version. For example, image_scale and inpainting options are not visible unless you use instructpix2pix or inpainting model.
Added numerous frequently used settings to GUI. Some of the rarely used settings (like masked guidance and masked callback inner tweaks) are still left out of the GUI to keep it relatively clean.
A reminder:
Changes in GUI will not be saved into the notebook, but if you run it with new settings, they will be saved to a settings.txt file as usual.
You can load settings in the misc tab.
You do not need to rerun the GUI cell after changing its settings.
v0.9
Changelog:
- add MultiControlNet
- add MultiControlNet autodownloader/loader
- add MultiControlNet order, weight, start/end steps, internal/external mode
- add MultiControlNet/Annotator cpu offload mode
- add empty negative image condition
- add softcap image range scaler
- update model guidance fn
MultiControlNet
Added MultiControlNet.
Consider using multicontrolnet by default even if you only need 1 model. It adds more options, that are otherwise unavailable in non-multi controlnet mode, like weight or controlnet start/end steps.
How to use:
Init Settings
Got to Load up a stable -> define SD + K functions, load model -> model_version -> control_multi
use_small_controlnet
- True
small_controlnet_model_path
- leave empty
download_control_model
- True
force_download
- Enable if some files appear to be corrupt, disable if everything is ok.
You can then specify a path to your custom v1.x checkpoint, and pick one or more controlnet models via checkboxes below. Those models will be downloaded if they are not available locally. You can redefine the list and order of controlnet models later in stable-settings cell.
Controlnets and their annotators will be automatically loaded, unloaded, downloaded, etc. depending on your settings in stable-settings cell.
Runtime settings
controlnet_multimodel_mode
External or internal. Internal - sums controlnet output values before feeding those into diffusion model, external - sum outputs of one controlnet conditioned diffusion model External seems slower but smoother, uses less VRAM
External mode:
controlnet1 -> diffusion -> output1
controlnet2 -> diffusion -> output2
weighted sum(output1 + output2) -> final result
Internal mode:
weighted sum(controlnet1 + controlnet2) -> diffusion -> final result
controlnet_multimodel settings
This is a dictionary containig a list of controlnet models. Order doesn't really matter, as their results are summed up.
Format example:
controlnet_multimodel = {
"control_sd15_depth":{
"weight":1,
"start":0,
"end":0.8
},
"control_sd15_canny":
{
"weight":0,
"start":0,
"end":1
}
}
weight (only available in internal mode) - weight of the model predictions in the output
start - % of total steps at which this controlnet begins working
end - % of total steps at which this controlnet stops working
This way you can: limit effect of certain models, mix more controlnets than you can fit into VRAM by making sure only a limited number of models runs at a given step.
controlnet steps are counted relatively to total steps
You have 50 steps, 0.3 style strength. Controlnet start 0.2 end 0.8 will run from step 10 to step 40, overlapping with the actual steps taken at steps 35-40
[||||||||||] 50 steps
[-------|||] 0.3 style strength (effective steps - 0.3x50 = 15)
[--||||||--] - controlnet working range with start = 0.2 and end = 0.8, effective steps from 0.2x50 = 10 to 0.8x50 = 40
Empty negative image condition
img_zero_uncond
By default image conditioned models use same image for negative conditioning, like if you specified the same text in both positive and negative prompt. (i.e. both positive and negative image conditionings are the same)
You can use empty negative condition by enabling this
Softcap image range scaler
do_softcap
Softly clamp latent excessive values. Reduces feedback loop effect a bit
softcap_thresh
Scale down absolute values above that threshold (latents are being clamped at [-1:1] range, so 0.9 will downscale values above 0.9 to fit into that range, [-1.5:1.5] will be scaled to [-1:1], but only absolute values over 0.9 will be affected)
softcap_q
Percentile to downscale. 1-downscale full range with outliers, 0.9 - downscale only 90% values above thresh, clamp remaining 10%)
A reminder:
Changes in GUI will not be saved into the notebook, but if you run it with new settings, they will be saved to a settings.txt file as usual.
You can load settings in misc tab.
You do not need to rerun the GUI cell after changing its settings.
Local install guide:
https://github.com/Sxela/WarpFusion/blob/main/README.md
Youtube playlist with settings:
https://www.youtube.com/watch?v=wvvcWm4Snmc&list=PL2cEnissQhlCUgjnGrdvYMwUaDkGemLGq
v0.8
Changelog:
New:
- add masked diffusion callback
- add masked latent guidance
- add option to offload model before decoder stage
- add fix noise option for latent guidance
- add noise, noise scale, fixed noise to masked diffusion
- add ControlNet models from https://github.com/lllyasviel/ControlNet
- add ControlNet downloads from https://colab.research.google.com/drive/1VRrDqT6xeETfMsfqYuCGhwdxcC2kLd2P
- add settings for ControlNet: canny filter ranges, detection size for depth/norm and other models
- add vae ckpt load for non-ControlNet models
- add selection by number to compare settings cell
- add noise to guiding image (init scale, latent scale)
- add noise resolution
- add guidance function for init scale
- add fixed seed option
- add separate base model for controlnet support
- add smaller controlnet support
- add invert mask for masked guidance
- add use_scale options to use loss scaler (guidance seems to work faster)
- add instruct pix2pix from https://github.com/timothybrooks/instruct-pix2pix
- add image_scale_schedule adn template to support instruct pix2pix
- add frame range to render a selected range of extracted frames only
- add load settings by run number
- add model cpu-gpu offload to free some vram
Fixes:
- fix frame_range starting not from zero not working thanks to Oleg#8668
- add controlnet_preprocessing switch to allow raw input
- fix sampler being locked to euler
- fix image_resolution error for controlnet models
- fix controlnet models not downloading (file not found error)
- fix settings not loading with -1 and empty batch folder
- fix prettytable requirement
- fix blip generationg ccaptions for n-th frame even with a different setting
- fix load settings not working for filepath
- fix norm colormatch error
- fix warp latent mode error
- fix prompts not working for loaded settings thanks to Euclidean Plane#1332
- fix prompts not being loaded from saved settings
- fix xformers cell hanging on Overwrite user query
- fix sampler not being loaded
- fix description_tooltip=turbo_frame_skips_steps error
- fix -1 settings not loading in empty folder
- fix -1 settings error
- fix colormatch offset mode first frame error
Separate base model for ControlNet / small controlnet support
You can now specify any v1.x model checkpoint with any of controlnet_v1.5_* model_version. It will assume that the checkpoint is correct and load it as the base of the controlnet model. It will then look for a small controlnet, and download it if it's not being found.
Masked diffusion
You can now use masked diffusion to stylize masked areas for more steps compared to the whole image. The mask source for now is the consistency mask.
Stable-settings -> Non-GUI -> mask_callback
mask_callback:
0 - off. 0.5-0.7 are good values. Value is a % of actual diffusion steps being made.
Diffuse inconsistent area for only before this % of actual steps, then diffuse whole image. So With 50 steps, 0.5 strength and 0.7 mask_callback you will diffuse the masked area for 500.50.7 = 17 steps, and the whole image will be diffused for 500.5(1-0.7) = 8 more steps to smoothen the transition.
cb_noise_upscale_ratio
- noise upscale in masked diffusion callback
cb_add_noise_to_latent
- add noise to latent in masked diffusion callback
cb_use_start_code
- fix noise per frame in masked diffusion callback
cb_fixed_code
- fix noise across all animation in masked diffusion callback (overcooks fast af)
cb_norm_latent
- normalize latent stats
Masked guidance
Stable-settings -> Non-GUI -> masked_guidance
Use mask for init/latent guidance to ignore inconsistencies and only guide based on the consistent areas
guidance_use_start_code
- fix noise per frame for guidance (default - True)
ControlNet
define SD + K functions, load model -> model_version -> control_sd15*_
You can select one of ControlNet models. Be aware that warp settings are vastly different across all of those models.
For example, depth/normal map models work best with style strength 0.9-1.0, lower values tend to just return back the input image. Canny model on the other hand is closer to v1.5 warp settings (style strength 0.5, 0.2 is okay with it)
download_control_model
:
check to download the checkpoint file for the selected ControlNet model together with it's conditioning counterpart (like midas depth estimator for normal map model)
force_download
:
check to force overwrite existing model file
Extra settings for ControlNet were added to the stable-settings cell.
detect_resolution
: size of the image being fed into ControlNet companion models like midas. Keep it the same size as you output image, or as high as you can get to it.
low_threshold
and high_threshold
: canny filter parameters
You can specify the source for ContolNet guidance just like you do for depth model: depth_source
parameter in GUI -> diffuse tab.
You can also use video as a source for ControlNet Conditioning.
Just put it here: cond_video_path
-> Video Input Settings and select GUI -> diffusion -> depth_source -> cond_video
Separate vae support
define SD + K functions, load model -> vae_ckpt
Use this to load a standalone variational autoencoder (vae) file.
Compare settings by number
Propagated the load settings fix to the compare settings cell. You can now specify only run numbers as well.
Additions to latent conditioning
This one is for people familiar the with diffusion process.
Added some options for init scale:
- Add noise corresponding to current diffusion timestep to "ground truth image" that we are comparing out generated result to
- Add noise scaling options so that the noise won't add, well, visible noise to the image :D
- Add criterion selection for init scale loss
Instruct Pix2Pix
Load up a stable -> define SD + K functions, load model -> model_version -> v1_instructpix2pix
gui -> diffusion -> image_scale_schedule
gui -> diffusion -> depth_source
More about it here: https://github.com/timothybrooks/instruct-pix2pix
Firstly, you'll need to download the checkpoint here: http://instruct-pix2pix.eecs.berkeley.edu/instruct-pix2pix-00-22000.ckpt
The settings are different from the usual models. cfg_scale and image_scale are highly interconnected. cfg_scale = 7.5 and image_scale = 1.5 are the defaults. If you wish to change cfg_scale, don't forget to adjust image_scale accordingly, or you will get too close or too far away from your image. Even 0.1 difference matters.
Image conditioning source is defined by depth_source argument (sic!), the default is init, which means the model looks at tet and init video frame and tries to combine both. You can try using stylized frame as the source instead, but this may overcook really fast.
You can use custom image conditioning video with instruct pix2pix as well.
Prompting works a bit different, negative prompts included. Try using instructions, like "turn her head into a pumpkin" instead of the usual mix of keywords.
Frame range
Diffuse! -> frame_range
Allows you render only a selected range of frames. For example, if you have extracted 100 frames, with frame_range = [25,75] you will only render 50 frames starting with frame 25.
Load settings by run number
You can now load default settings or load settings via gui by the number of the run, if it's in current batch folder. So if your batchname is stable_warpfusion_0.6.0, you can set default_settings_path to 50 and it will load the settings from batch folder stable_warpfusion_0.6.0, run #50. You can also set it to -1 to load settings from the latest run.
Model cpu-gpu offload
Automatically offload diffusion and text encoder models to cpu ram before decoding image from latent space. Should save a bit of VRAM at the cost of some speed. Your feedback is appreciated here.
A reminder:
Changes in GUI will not be saved into the notebook, but if you run it with new settings, they will be saved to a settings.txt file as usual.
You can load settings in misc tab.
You do not need to rerun the GUI cell after changing its settings.
Local install guide:
https://github.com/Sxela/WarpFusion/blob/main/README.md
Youtube playlist with settings:
https://www.youtube.com/watch?v=wvvcWm4Snmc&list=PL2cEnissQhlCUgjnGrdvYMwUaDkGemLGq