-
Notifications
You must be signed in to change notification settings - Fork 18
Image generation documentation #193
Changes from 7 commits
4c29116
3161bf8
0d1063a
5d0ad5f
52f74ef
a755ed4
70b4471
d13a5fc
823907c
2ee3a67
01eef0f
3853f91
eae7cb5
9cfaf60
02bfef4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
## Overview | ||
|
||
Image generation engines facilitate the use of [image generation drivers](../structures/image-generation-drivers.md) by image generation tasks and tools. Each image generation engine defines a `run` method that accepts the inputs necessary for each image generation mode, combines these inputs with any available rulesets, and provides the request to the configured image generation driver. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Link to reference docs for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this sentence is monotonous with use of the phrase "Image generation" used three times. Suggest splitting this up into the customer benefit first, followed by how it achieves it (maybe two sentences). |
||
|
||
#### Rulesets | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be an H3? |
||
|
||
[Rulesets](../structures/rulesets.md) provided to image generation engines are combined with prompts, providing further instruction to image generation models. In addition to typical Rulesets, image generation engines support Negative Rulesets. Negative Rulesets are used by [image generation drivers](../structures/image-generation-drivers.md) with support for prompt wieghting and used to influence the image generation model to avoid undesireable features described by negative prompts. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wieghting -> weighting There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, lead with customer benefit/usage to anchor the value for the reader. e.g., "Customers use Negative Rulesets to influence the model to avoid undesirable results, for example by specifying X Y Z.". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call, updated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also may want to run this through a spell check. I discovered that I am unable to spell "undesirable" without a lot of help. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is what I get for trying VSCode. Back to PyCharm! |
||
|
||
### Prompt Image Generation Engine | ||
|
||
This image generation engine facilitates generating images from text prompts. | ||
|
||
```python | ||
from griptape.structures import Agent | ||
from griptape.engines import PromptImageGenerationEngine | ||
from griptape.drivers import AmazonBedrockImageGenerationDriver, \ | ||
BedrockStableDiffusionImageGenerationModelDriver | ||
from griptape.tools import PromptImageGenerationClient | ||
|
||
|
||
# Define positive and negative rulesets. | ||
positive_ruleset = Ruleset(rules=[Rule("realistic"), Rule("high quality")]) | ||
negative_ruleset = Ruleset(rules=[Rule("distorted")]) | ||
SavagePencil marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Create a driver configured to use Stable Diffusion via Bedrock. | ||
driver = AmazonBedrockImageGenerationDriver( | ||
image_generation_model_driver=BedrockStableDiffusionImageGenerationModelDriver(), | ||
model="stability.stable-diffusion-xl-v0", | ||
) | ||
|
||
# Create an engine configured to use the driver. | ||
engine = PromptImageGenerationEngine( | ||
rulesets=[positive_ruleset], | ||
negative_rulesets=[negative_ruleset], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a lot of code, which means a lot to maintain if we make refactors or upstream changes. Are we able to automate testing it? Should we pare it down to only a handful of lines? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We currently do automate testing this, see There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @andrewfrench can you try creating a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done! The LLM looks happy to pull from there. |
||
image_generation_driver=driver, | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Show running the Engine. |
||
|
||
# Create a tool configured to use the engine. | ||
tool = PromptImageGenerationClient( | ||
image_generation_engine=engine, | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need to show Tool creation here since we have a dedicated section for Tools. |
||
``` | ||
|
||
### Variation Image Generation Engine | ||
|
||
This image generation engine facilitates generating variations of an input image according to a text prompt. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could we pare this down to just the deltas? I had to re-read it a few times to note that there were some class name changes |
||
|
||
```python | ||
from griptape.structures import Agent | ||
from griptape.engines import VariationImageGenerationEngine | ||
from griptape.drivers import AmazonBedrockImageGenerationDriver, \ | ||
BedrockStableDiffusionImageGenerationModelDriver | ||
from griptape.tools import VariationImageGenerationClient | ||
|
||
|
||
# Define positive and negative rulesets. | ||
positive_ruleset = Ruleset(rules=[Rule("realistic"), Rule("high quality")]) | ||
negative_ruleset = Ruleset(rules=[Rule("distorted")]) | ||
|
||
# Create a driver configured to use Stable Diffusion via Bedrock. | ||
driver = AmazonBedrockImageGenerationDriver( | ||
image_generation_model_driver=BedrockStableDiffusionImageGenerationModelDriver(), | ||
model="stability.stable-diffusion-xl-v0", | ||
) | ||
|
||
# Create an engine configured to use the driver. | ||
engine = VariationImageGenerationEngine( | ||
rulesets=[positive_ruleset], | ||
negative_rulesets=[negative_ruleset], | ||
image_generation_driver=driver, | ||
) | ||
|
||
# Create a tool configured to use the engine. | ||
tool = VariationImageGenerationClient( | ||
image_generation_engine=engine, | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same points. |
||
``` | ||
|
||
### Inpainting Image Generation Engine | ||
|
||
This image generation engine facilitates image inpainting, or modifying an input image according to a text prompt within the bounds of a mask defined by mask image. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we make this a more concrete explanation? I don't know what the benefit here is. |
||
```python | ||
from griptape.structures import Agent | ||
from griptape.engines import InpaintingImageGenerationEngine | ||
from griptape.drivers import AmazonBedrockImageGenerationDriver, \ | ||
BedrockStableDiffusionImageGenerationModelDriver | ||
from griptape.tools import InpaintingImageGenerationClient | ||
|
||
|
||
# Define positive and negative rulesets. | ||
positive_ruleset = Ruleset(rules=[Rule("realistic"), Rule("high quality")]) | ||
negative_ruleset = Ruleset(rules=[Rule("distorted")]) | ||
|
||
# Create a driver configured to use Stable Diffusion via Bedrock. | ||
driver = AmazonBedrockImageGenerationDriver( | ||
image_generation_model_driver=BedrockStableDiffusionImageGenerationModelDriver(), | ||
model="stability.stable-diffusion-xl-v0", | ||
) | ||
|
||
# Create an engine configured to use the driver. | ||
engine = InpaintingImageGenerationEngine( | ||
rulesets=[positive_ruleset], | ||
negative_rulesets=[negative_ruleset], | ||
image_generation_driver=driver, | ||
) | ||
|
||
# Create a tool configured to use the engine. | ||
tool = InpaintingImageGenerationClient( | ||
image_generation_engine=engine, | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same points |
||
``` | ||
|
||
### Outpainting Image Generation Engine | ||
|
||
This image generation engine facilitates image outpainting, or modifying an input image according to a text prompt outside the bounds of a mask defined by a mask image. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
|
||
```python | ||
from griptape.structures import Agent | ||
from griptape.engines import OutpaintingImageGenerationEngine | ||
from griptape.drivers import AmazonBedrockImageGenerationDriver, \ | ||
BedrockStableDiffusionImageGenerationModelDriver | ||
from griptape.tools import OutpaintingImageGenerationClient | ||
|
||
|
||
# Define positive and negative rulesets. | ||
positive_ruleset = Ruleset(rules=[Rule("realistic"), Rule("high quality")]) | ||
negative_ruleset = Ruleset(rules=[Rule("distorted")]) | ||
|
||
# Create a driver configured to use Stable Diffusion via Bedrock. | ||
driver = AmazonBedrockImageGenerationDriver( | ||
image_generation_model_driver=BedrockStableDiffusionImageGenerationModelDriver(), | ||
model="stability.stable-diffusion-xl-v0", | ||
) | ||
|
||
# Create an engine configured to use the driver. | ||
engine = OutpaintingImageGenerationEngine( | ||
rulesets=[positive_ruleset], | ||
negative_rulesets=[negative_ruleset], | ||
image_generation_driver=driver, | ||
) | ||
|
||
# Create a tool configured to use the engine. | ||
tool = OutpaintingImageGenerationClient( | ||
image_generation_engine=engine, | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same points. |
||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -104,3 +104,25 @@ WebLoader().load_collection( | |
["https://www.griptape.ai", "https://docs.griptape.ai"] | ||
) | ||
``` | ||
|
||
## Image Loader | ||
|
||
The Image Loader is used to load an image from the filesystem, returning an ImageArtifact. | ||
|
||
```python | ||
from griptape.loaders import ImageLoader | ||
|
||
image_artifact = ImageLoader().load("my_image.png") | ||
|
||
image_artifacts = ImageLoader().load_collection("image_1.png", "image_2.png") | ||
``` | ||
|
||
By default, the Image Loader will ensure all images are in `png` format. If an image in another format (for example, `jpg`) is loaded, it will be reformatted to `png`. Other formats are supported through the `format` field. | ||
|
||
```python | ||
from griptape.loaders import ImageLoader | ||
|
||
|
||
# Image data in Image Artifact will be in JPG format | ||
image_artifact_jpg = ImageLoader(format="JPG").load("my_image.png") | ||
SavagePencil marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since this is the override behavior, can we include another line that loads it "normal-like" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The default example is above |
||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,178 @@ | ||
## Overview | ||
|
||
Image generation drivers are used by [image generation engines](../data/image-generation-engines.md) to build and execute API calls to image generation models. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Link to reference docs for Image Generation Drivers |
||
|
||
Use a driver to build an engine, then pass it to a tool for use by an [Agent](../structures/agents.md): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Capitalization |
||
|
||
```python | ||
from griptape.structures import Agent | ||
from griptape.engines import PromptImageGenerationEngine | ||
from griptape.drivers import OpenAiDalleImageGenerationDriver | ||
from griptape.tools import PromptImageGenerationClient, FileManager | ||
|
||
driver = OpenAiDalleImageGenerationDriver( | ||
model="dall-e-3", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Open Q: since Dall-E 3 requires a separate monthly subscription, would it be more accessible to start with Dall-E 2? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These examples aren't prescriptive, but I updated this to |
||
) | ||
|
||
engine = PromptImageGenerationEngine(image_generation_driver=driver) | ||
|
||
agent = Agent(tools=[ | ||
PromptImageGenerationClient(image_generation_engine=engine), | ||
FileManager(), | ||
]) | ||
|
||
agent.run("Generate a watercolor painting of a dog riding a skateboard. Save the image as rad-dog.png.") | ||
``` | ||
|
||
### Amazon Bedrock | ||
|
||
The Amazon Bedrock image generation driver provides multi-model access to image generation models hosted by Amazon Bedrock. This driver manages the API calls to the Bedrock API, while the specific model drivers below format the API requests and parse the responses. | ||
|
||
#### Bedrock Stable Diffusion Model Driver | ||
|
||
The Bedrock Stable Diffusion model driver provides support for Stable Diffusion models hosted by Amazon Bedrock. This model driver supports configurations specific to Stable Diffusion, like style presets, clip guidance presets, sampler, and more. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: |
||
|
||
This model driver supports negative prompts. When provided (for example, when used with an [image generation engine](../data/image-generation-engines.md) configured with negative rulesets), the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we want to illustrate the negative prompts in action? Perhaps one run without, one with? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added an example including negative rules |
||
```python | ||
from griptape.structures import Agent | ||
from griptape.tools import PromptImageGenerationClient, FileManager | ||
from griptape.engines import PromptImageGenerationEngine | ||
from griptape.drivers import AmazonBedrockImageGenerationDriver, \ | ||
BedrockStableDiffusionImageGenerationModelDriver | ||
|
||
model_driver = BedrockStableDiffusionImageGenerationModelDriver( | ||
style_preset="pixel-art", | ||
steps=50, | ||
) | ||
|
||
driver = AmazonBedrockImageGenerationDriver( | ||
image_generation_model_driver=model_driver, | ||
) | ||
|
||
engine = PromptImageGenerationEngine(image_generation_driver=driver) | ||
|
||
agent = Agent(tools=[ | ||
PromptImageGenerationClient(image_generation_engine=engine), | ||
FileManager(), | ||
]) | ||
|
||
agent.run("Generate a watercolor painting of a dog riding a skateboard. Save the image as rad-dog.png.") | ||
``` | ||
|
||
#### Amazon Bedrock Titan Image Generator Model Driver | ||
|
||
The Amazon Bedrock Titan Image Generator model driver provides support for Titan Image Generator models hosted by Amazon Bedrock. This model driver supports configurations specific to Titan Image Generator, like quality, seed, and cfg_scale. | ||
|
||
This model driver supports negative prompts. When provided (for example, when used with an [image generation engine](../data/image-generation-engines.md) configured with negative rulesets), the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation. | ||
|
||
```python | ||
from griptape.structures import Agent | ||
from griptape.tools import PromptImageGenerationClient, FileManager | ||
from griptape.engines import PromptImageGenerationEngine | ||
from griptape.drivers import AmazonBedrockImageGenerationDriver\ | ||
BedrockTitanImageGeneratorImageGenerationModelDriver | ||
|
||
model_driver = BedrockTitanImageGeneratorImageGenerationModelDriver( | ||
quality="hd", | ||
) | ||
|
||
driver = AmazonBedrockImageGenerationDriver( | ||
image_generation_model_driver=model_driver, | ||
) | ||
|
||
engine = PromptImageGenerationEngine(image_generation_driver=driver) | ||
|
||
agent = Agent(tools=[ | ||
PromptImageGenerationClient(image_generation_engine=engine), | ||
FileManager(), | ||
]) | ||
|
||
agent.run("Generate a watercolor painting of a dog riding a skateboard. Save the image as rad-dog.png.") | ||
``` | ||
|
||
### Azure OpenAI DALL-E | ||
|
||
The Azure OpenAI DALL-E image generation driver provides access to OpenAI DALL-E models hosted by Azure. In addition to the configurations provided by the underlying OpenAI DALL-E driver, the Azure OpenAI Dall-E Driver allows configuration of Azure-specific deployment values. | ||
|
||
```python | ||
from griptape.structures import Agent | ||
from griptape.tools import PromptImageGenerationClient, FileManager | ||
from griptape.engines import PromptImageGenerationEngine | ||
from griptape.drivers import AzureOpenAiDalleImageGenerationDriver | ||
|
||
driver = AzureOpenAiDalleImageGenerationDriver( | ||
model="dall-e-3", | ||
azure_deployment="my-azure-deployment", | ||
azure_endpoint="https://example-endpoint.openai.azure.com", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Load from environment variables. |
||
) | ||
|
||
engine = PromptImageGenerationEngine(image_generation_driver=driver) | ||
|
||
agent = Agent(tools=[ | ||
PromptImageGenerationClient(image_generation_engine=engine), | ||
FileManager(), | ||
]) | ||
|
||
agent.run("Generate a watercolor painting of a dog riding a skateboard. Save the image as rad-dog.png.") | ||
``` | ||
|
||
### Leonardo.Ai | ||
|
||
The Leonardo image generation driver enables image generation using models hosted by [Leonardo.ai](https://leonardo.ai/). | ||
|
||
The Leonardo image generation driver supports configurations like model selection, image size, specifying a generation seed, and generation steps. For details on supported configuration parameters, see [Leonardo.Ai's image generation documentation](https://docs.leonardo.ai/reference/creategeneration). | ||
|
||
This driver supports negative prompts. When provided (for example, when used with an [image generation engine](../data/image-generation-engines.md) configured with negative rulesets), the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation. | ||
|
||
```python | ||
import os | ||
|
||
from griptape.structures import Agent | ||
from griptape.tools import PromptImageGenerationClient, FileManager | ||
from griptape.engines import PromptImageGenerationEngine | ||
from griptape.drivers import LeonardoImageGenerationDriver | ||
|
||
driver = LeonardoImageGenerationDriver( | ||
model="6bef9f1b-29cb-40c7-b9df-32b51c1f67d3", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Load from environment variable |
||
api_key=os.getenv("LEONARDO_API_KEY"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add to .github/workflows/integration-tests.yml vars. |
||
image_width=512, | ||
image_height=1024, | ||
) | ||
|
||
engine = PromptImageGenerationEngine(image_generation_driver=driver) | ||
|
||
agent = Agent(tools=[ | ||
PromptImageGenerationClient(image_generation_engine=engine), | ||
FileManager(), | ||
]) | ||
|
||
agent.run("Generate a watercolor painting of a dog riding a skateboard. Save the image as rad-dog.png.") | ||
``` | ||
|
||
### OpenAI DALL-E | ||
|
||
The OpenAI DALL-E image generation driver enables image generation using OpenAI DALL-E models. Like other OpenAI drivers, the image generation driver will implicitly load an API key in the `OPENAI_API_KEY` environment variable if one is not explicitly provided. | ||
|
||
The OpenAI Dall-E driver supports image generation configurations like style presets, image quality preference, and image size. For details on supported configuration values, see the [OpenAI documentation](https://platform.openai.com/docs/guides/images/introduction). | ||
|
||
```python | ||
from griptape.structures import Agent | ||
from griptape.tools import PromptImageGenerationClient, FileManager | ||
from griptape.engines import PromptImageGenerationEngine | ||
from griptape.drivers import OpenAiDalleImageGenerationDriver | ||
|
||
driver = OpenAiDalleImageGenerationDriver( | ||
model="dall-e-2" | ||
image_size="512x512", | ||
) | ||
|
||
engine = PromptImageGenerationEngine(image_generation_driver=driver) | ||
|
||
agent = Agent(tools=[ | ||
PromptImageGenerationClient(image_generation_engine=engine), | ||
FileManager(), | ||
]) | ||
|
||
agent.run("Generate a watercolor painting of a dog riding a skateboard. Save the image as rad-dog.png.") | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize Griptape things like Engines, Drivers, Tasks, Tools, Rulesets throughout docs.