This is a fork of the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.
For details on the pre-trained models in this repository, see the Model Card.
To install this package, clone this repository and then run:
pip install -e .
For detailed usage examples, see the notebooks directory.
- The text2im notebook shows how to use GLIDE (filtered) with classifier-free guidance to produce images conditioned on text prompts. The local version of this notebook is
text2im.py
- The inpaint notebook shows how to use GLIDE (filtered) to fill in a masked region of an image, conditioned on a text prompt. The local version of this notebook is
inpaint.py
. - The clip_guided notebook shows how to use GLIDE (filtered) + a filtered noise-aware CLIP model to produce images conditioned on text prompts. The local version of this notebook is
clip_guided.py
.
The local versions of the notebooks are as close as possible to the original notebooks, which remain unchanged here. Changes to local versions include:
- No need for "display"
- Individual images are also saved, as well as the image strip (only upscaled images are saved by default)
Additionally, a more commandline-friendly generation script, generate.py
, is available. It can be set to use either classifier-free guidance, or CLIP guidance.
To use the generation script, simply run it with a text prompt as an additional commandline parameter:
python generate.py "Painting of an apple"
Example output under the given prompt:
Multiple prompts can be specified, separated via "||". Individual batch items will cycle through the prompts in order. This can be used to evaluate multiple variations of a prompt in the same batch.
Parameters for configuring the generation script can be viewed with the -h
flag:
> python generate.py -h
usage: GLIDE Text2Image [-h] [-s S] [-gs GS] [-cf] [-tb TB] [-tu TU] [-ut UT] [-ss] [-ni] [-v] [-rc RC] [prompt]
positional arguments:
prompt Prompt for image generation. Batch items cycles through multiple prompts separated by ||
optional arguments:
-h, --help show this help message and exit
-s S Batch size: Higher values generate more images at once while using more RAM
-gs GS Guidance scale parameter during generation (Higher values may improve quality, but reduce diversity)
-cf Use classifier-free guidance instead of CLIP guidance. CF guidance may yield 'cleaner' images, while
CLIP guidance may be better at interpreting more complex prompts.
-tb TB Timestep value for base model. For faster generation, lower values (e.g. '100') can be used
-tu TU Timestep value for upscaler. For faster generation, use 'fast27'
-ut UT Temperature value for the upscaler. '1.0' will result in sharper, but potentially noisier/grainier
images
-ss Additionally save the small 64x64 images (before the upscaling step)
-ni Don't save individual images (after the upscaling step)
-v Verbose mode: print additional runtime information
-rc RC Amount of different random prompts to use when no prompt is given
Text2Image generation using GLIDE, with classifier-free or CLIP guidance.