Skip to content
This repository has been archived by the owner on Apr 16, 2024. It is now read-only.

Image generation documentation #193

Merged
merged 15 commits into from
Jan 11, 2024
147 changes: 147 additions & 0 deletions docs/griptape-framework/data/image-generation-engines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
## Overview

Image generation engines facilitate the use of [image generation drivers](../structures/image-generation-drivers.md) by image generation tasks and tools. Each image generation engine defines a `run` method that accepts the inputs necessary for each image generation mode, combines these inputs with any available rulesets, and provides the request to the configured image generation driver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize Griptape things like Engines, Drivers, Tasks, Tools, Rulesets throughout docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to reference docs for Image Generation Engines

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence is monotonous with use of the phrase "Image generation" used three times. Suggest splitting this up into the customer benefit first, followed by how it achieves it (maybe two sentences).


#### Rulesets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an H3?


[Rulesets](../structures/rulesets.md) provided to image generation engines are combined with prompts, providing further instruction to image generation models. In addition to typical Rulesets, image generation engines support Negative Rulesets. Negative Rulesets are used by [image generation drivers](../structures/image-generation-drivers.md) with support for prompt wieghting and used to influence the image generation model to avoid undesireable features described by negative prompts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wieghting -> weighting
undesireable -> undesirable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, lead with customer benefit/usage to anchor the value for the reader. e.g., "Customers use Negative Rulesets to influence the model to avoid undesirable results, for example by specifying X Y Z.".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also may want to run this through a spell check. I discovered that I am unable to spell "undesirable" without a lot of help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I get for trying VSCode. Back to PyCharm!


### Prompt Image Generation Engine

This image generation engine facilitates generating images from text prompts.

```python
from griptape.structures import Agent
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import AmazonBedrockImageGenerationDriver, \
BedrockStableDiffusionImageGenerationModelDriver
from griptape.tools import PromptImageGenerationClient


# Define positive and negative rulesets.
positive_ruleset = Ruleset(rules=[Rule("realistic"), Rule("high quality")])
negative_ruleset = Ruleset(rules=[Rule("distorted")])

# Create a driver configured to use Stable Diffusion via Bedrock.
driver = AmazonBedrockImageGenerationDriver(
image_generation_model_driver=BedrockStableDiffusionImageGenerationModelDriver(),
model="stability.stable-diffusion-xl-v0",
)

# Create an engine configured to use the driver.
engine = PromptImageGenerationEngine(
rulesets=[positive_ruleset],
negative_rulesets=[negative_ruleset],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a lot of code, which means a lot to maintain if we make refactors or upstream changes. Are we able to automate testing it? Should we pare it down to only a handful of lines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently do automate testing this, see tests/integration/test_code_snippets.py. Unfortunately that means we need the boilerplate dependency instantiation because this is real code that gets executed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewfrench can you try creating a tests/assets/ directory to see if the code snippets can pull resources from there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! The LLM looks happy to pull from there.

image_generation_driver=driver,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show running the Engine.


# Create a tool configured to use the engine.
tool = PromptImageGenerationClient(
image_generation_engine=engine,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to show Tool creation here since we have a dedicated section for Tools.

```

### Variation Image Generation Engine

This image generation engine facilitates generating variations of an input image according to a text prompt.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we pare this down to just the deltas? I had to re-read it a few times to note that there were some class name changes


```python
from griptape.structures import Agent
from griptape.engines import VariationImageGenerationEngine
from griptape.drivers import AmazonBedrockImageGenerationDriver, \
BedrockStableDiffusionImageGenerationModelDriver
from griptape.tools import VariationImageGenerationClient


# Define positive and negative rulesets.
positive_ruleset = Ruleset(rules=[Rule("realistic"), Rule("high quality")])
negative_ruleset = Ruleset(rules=[Rule("distorted")])

# Create a driver configured to use Stable Diffusion via Bedrock.
driver = AmazonBedrockImageGenerationDriver(
image_generation_model_driver=BedrockStableDiffusionImageGenerationModelDriver(),
model="stability.stable-diffusion-xl-v0",
)

# Create an engine configured to use the driver.
engine = VariationImageGenerationEngine(
rulesets=[positive_ruleset],
negative_rulesets=[negative_ruleset],
image_generation_driver=driver,
)

# Create a tool configured to use the engine.
tool = VariationImageGenerationClient(
image_generation_engine=engine,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same points.

```

### Inpainting Image Generation Engine

This image generation engine facilitates image inpainting, or modifying an input image according to a text prompt within the bounds of a mask defined by mask image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this a more concrete explanation? I don't know what the benefit here is.

```python
from griptape.structures import Agent
from griptape.engines import InpaintingImageGenerationEngine
from griptape.drivers import AmazonBedrockImageGenerationDriver, \
BedrockStableDiffusionImageGenerationModelDriver
from griptape.tools import InpaintingImageGenerationClient


# Define positive and negative rulesets.
positive_ruleset = Ruleset(rules=[Rule("realistic"), Rule("high quality")])
negative_ruleset = Ruleset(rules=[Rule("distorted")])

# Create a driver configured to use Stable Diffusion via Bedrock.
driver = AmazonBedrockImageGenerationDriver(
image_generation_model_driver=BedrockStableDiffusionImageGenerationModelDriver(),
model="stability.stable-diffusion-xl-v0",
)

# Create an engine configured to use the driver.
engine = InpaintingImageGenerationEngine(
rulesets=[positive_ruleset],
negative_rulesets=[negative_ruleset],
image_generation_driver=driver,
)

# Create a tool configured to use the engine.
tool = InpaintingImageGenerationClient(
image_generation_engine=engine,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same points

```

### Outpainting Image Generation Engine

This image generation engine facilitates image outpainting, or modifying an input image according to a text prompt outside the bounds of a mask defined by a mask image.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


```python
from griptape.structures import Agent
from griptape.engines import OutpaintingImageGenerationEngine
from griptape.drivers import AmazonBedrockImageGenerationDriver, \
BedrockStableDiffusionImageGenerationModelDriver
from griptape.tools import OutpaintingImageGenerationClient


# Define positive and negative rulesets.
positive_ruleset = Ruleset(rules=[Rule("realistic"), Rule("high quality")])
negative_ruleset = Ruleset(rules=[Rule("distorted")])

# Create a driver configured to use Stable Diffusion via Bedrock.
driver = AmazonBedrockImageGenerationDriver(
image_generation_model_driver=BedrockStableDiffusionImageGenerationModelDriver(),
model="stability.stable-diffusion-xl-v0",
)

# Create an engine configured to use the driver.
engine = OutpaintingImageGenerationEngine(
rulesets=[positive_ruleset],
negative_rulesets=[negative_ruleset],
image_generation_driver=driver,
)

# Create a tool configured to use the engine.
tool = OutpaintingImageGenerationClient(
image_generation_engine=engine,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same points.

```
22 changes: 22 additions & 0 deletions docs/griptape-framework/data/loaders.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,25 @@ WebLoader().load_collection(
["https://www.griptape.ai", "https://docs.griptape.ai"]
)
```

## Image Loader

The Image Loader is used to load an image from the filesystem, returning an ImageArtifact.

```python
from griptape.loaders import ImageLoader

image_artifact = ImageLoader().load("my_image.png")

image_artifacts = ImageLoader().load_collection("image_1.png", "image_2.png")
```

By default, the Image Loader will ensure all images are in `png` format. If an image in another format (for example, `jpg`) is loaded, it will be reformatted to `png`. Other formats are supported through the `format` field.

```python
from griptape.loaders import ImageLoader


# Image data in Image Artifact will be in JPG format
image_artifact_jpg = ImageLoader(format="JPG").load("my_image.png")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is the override behavior, can we include another line that loads it "normal-like"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default example is above

```
178 changes: 178 additions & 0 deletions docs/griptape-framework/structures/image-generation-drivers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
## Overview

Image generation drivers are used by [image generation engines](../data/image-generation-engines.md) to build and execute API calls to image generation models.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to reference docs for Image Generation Drivers


Use a driver to build an engine, then pass it to a tool for use by an [Agent](../structures/agents.md):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalization


```python
from griptape.structures import Agent
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import OpenAiDalleImageGenerationDriver
from griptape.tools import PromptImageGenerationClient, FileManager

driver = OpenAiDalleImageGenerationDriver(
model="dall-e-3",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open Q: since Dall-E 3 requires a separate monthly subscription, would it be more accessible to start with Dall-E 2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples aren't prescriptive, but I updated this to dall-e-2 because the Azure driver using our deployment requires dall-e-3 and the downgrade here will save us a bit when running integration tests.

)

engine = PromptImageGenerationEngine(image_generation_driver=driver)

agent = Agent(tools=[
PromptImageGenerationClient(image_generation_engine=engine),
FileManager(),
])

agent.run("Generate a watercolor painting of a dog riding a skateboard. Save the image as rad-dog.png.")
```

### Amazon Bedrock

The Amazon Bedrock image generation driver provides multi-model access to image generation models hosted by Amazon Bedrock. This driver manages the API calls to the Bedrock API, while the specific model drivers below format the API requests and parse the responses.

#### Bedrock Stable Diffusion Model Driver

The Bedrock Stable Diffusion model driver provides support for Stable Diffusion models hosted by Amazon Bedrock. This model driver supports configurations specific to Stable Diffusion, like style presets, clip guidance presets, sampler, and more.
Copy link
Contributor

@cjkindel cjkindel Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: , and more may be unnecessary after already qualifying list as incomplete with like ...


This model driver supports negative prompts. When provided (for example, when used with an [image generation engine](../data/image-generation-engines.md) configured with negative rulesets), the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to illustrate the negative prompts in action? Perhaps one run without, one with?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example including negative rules

```python
from griptape.structures import Agent
from griptape.tools import PromptImageGenerationClient, FileManager
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import AmazonBedrockImageGenerationDriver, \
BedrockStableDiffusionImageGenerationModelDriver

model_driver = BedrockStableDiffusionImageGenerationModelDriver(
style_preset="pixel-art",
steps=50,
)

driver = AmazonBedrockImageGenerationDriver(
image_generation_model_driver=model_driver,
)

engine = PromptImageGenerationEngine(image_generation_driver=driver)

agent = Agent(tools=[
PromptImageGenerationClient(image_generation_engine=engine),
FileManager(),
])

agent.run("Generate a watercolor painting of a dog riding a skateboard. Save the image as rad-dog.png.")
```

#### Amazon Bedrock Titan Image Generator Model Driver

The Amazon Bedrock Titan Image Generator model driver provides support for Titan Image Generator models hosted by Amazon Bedrock. This model driver supports configurations specific to Titan Image Generator, like quality, seed, and cfg_scale.

This model driver supports negative prompts. When provided (for example, when used with an [image generation engine](../data/image-generation-engines.md) configured with negative rulesets), the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.

```python
from griptape.structures import Agent
from griptape.tools import PromptImageGenerationClient, FileManager
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import AmazonBedrockImageGenerationDriver\
BedrockTitanImageGeneratorImageGenerationModelDriver

model_driver = BedrockTitanImageGeneratorImageGenerationModelDriver(
quality="hd",
)

driver = AmazonBedrockImageGenerationDriver(
image_generation_model_driver=model_driver,
)

engine = PromptImageGenerationEngine(image_generation_driver=driver)

agent = Agent(tools=[
PromptImageGenerationClient(image_generation_engine=engine),
FileManager(),
])

agent.run("Generate a watercolor painting of a dog riding a skateboard. Save the image as rad-dog.png.")
```

### Azure OpenAI DALL-E

The Azure OpenAI DALL-E image generation driver provides access to OpenAI DALL-E models hosted by Azure. In addition to the configurations provided by the underlying OpenAI DALL-E driver, the Azure OpenAI Dall-E Driver allows configuration of Azure-specific deployment values.

```python
from griptape.structures import Agent
from griptape.tools import PromptImageGenerationClient, FileManager
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import AzureOpenAiDalleImageGenerationDriver

driver = AzureOpenAiDalleImageGenerationDriver(
model="dall-e-3",
azure_deployment="my-azure-deployment",
azure_endpoint="https://example-endpoint.openai.azure.com",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load from environment variables.

)

engine = PromptImageGenerationEngine(image_generation_driver=driver)

agent = Agent(tools=[
PromptImageGenerationClient(image_generation_engine=engine),
FileManager(),
])

agent.run("Generate a watercolor painting of a dog riding a skateboard. Save the image as rad-dog.png.")
```

### Leonardo.Ai

The Leonardo image generation driver enables image generation using models hosted by [Leonardo.ai](https://leonardo.ai/).

The Leonardo image generation driver supports configurations like model selection, image size, specifying a generation seed, and generation steps. For details on supported configuration parameters, see [Leonardo.Ai's image generation documentation](https://docs.leonardo.ai/reference/creategeneration).

This driver supports negative prompts. When provided (for example, when used with an [image generation engine](../data/image-generation-engines.md) configured with negative rulesets), the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.

```python
import os

from griptape.structures import Agent
from griptape.tools import PromptImageGenerationClient, FileManager
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import LeonardoImageGenerationDriver

driver = LeonardoImageGenerationDriver(
model="6bef9f1b-29cb-40c7-b9df-32b51c1f67d3",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load from environment variable

api_key=os.getenv("LEONARDO_API_KEY"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to .github/workflows/integration-tests.yml vars.

image_width=512,
image_height=1024,
)

engine = PromptImageGenerationEngine(image_generation_driver=driver)

agent = Agent(tools=[
PromptImageGenerationClient(image_generation_engine=engine),
FileManager(),
])

agent.run("Generate a watercolor painting of a dog riding a skateboard. Save the image as rad-dog.png.")
```

### OpenAI DALL-E

The OpenAI DALL-E image generation driver enables image generation using OpenAI DALL-E models. Like other OpenAI drivers, the image generation driver will implicitly load an API key in the `OPENAI_API_KEY` environment variable if one is not explicitly provided.

The OpenAI Dall-E driver supports image generation configurations like style presets, image quality preference, and image size. For details on supported configuration values, see the [OpenAI documentation](https://platform.openai.com/docs/guides/images/introduction).

```python
from griptape.structures import Agent
from griptape.tools import PromptImageGenerationClient, FileManager
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import OpenAiDalleImageGenerationDriver

driver = OpenAiDalleImageGenerationDriver(
model="dall-e-2"
image_size="512x512",
)

engine = PromptImageGenerationEngine(image_generation_driver=driver)

agent = Agent(tools=[
PromptImageGenerationClient(image_generation_engine=engine),
FileManager(),
])

agent.run("Generate a watercolor painting of a dog riding a skateboard. Save the image as rad-dog.png.")
```
Loading