Skip to content
This repository has been archived by the owner on Apr 16, 2024. It is now read-only.

Image generation documentation #193

Merged
merged 15 commits into from
Jan 11, 2024
Merged

Conversation

andrewfrench
Copy link
Member

@andrewfrench andrewfrench commented Jan 10, 2024

This will be ready to review/merge once pending image generation PRs are merged.

Add docs for:

  • Image generation drivers
  • Image generation model drivers
  • Image generation engine
  • Image generation tasks (generation, variation, inpainting, outpainting)
  • Image generation tool
  • Image loader

Resolves #182


📚 Documentation preview 📚: https://griptape--193.org.readthedocs.build/en/193/

@andrewfrench andrewfrench requested review from collindutter and a team January 10, 2024 21:23
@andrewfrench andrewfrench marked this pull request as ready for review January 10, 2024 21:23

#### Rulesets

[Rulesets](../structures/rulesets.md) provided to image generation engines are combined with prompts, providing further instruction to image generation models. In addition to typical Rulesets, image generation engines support Negative Rulesets. Negative Rulesets are used by [image generation drivers](../structures/image-generation-drivers.md) with support for prompt wieghting and used to influence the image generation model to avoid undesireable features described by negative prompts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wieghting -> weighting
undesireable -> undesirable


#### Bedrock Stable Diffusion Model Driver

The Bedrock Stable Diffusion model driver provides support for Stable Diffusion models hosted by Amazon Bedrock. This model driver supports configurations specific to Stable Diffusion, like style presets, clip guidance presets, sampler, and more.
Copy link
Contributor

@cjkindel cjkindel Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: , and more may be unnecessary after already qualifying list as incomplete with like ...

To generate an image, use one of the following Image Generation Tasks. All Image Generation Tasks accept an Image Generation Engine configured to use an [Image Generation Driver](./image-generation-drivers.md).

All successful Image Generation Tasks will always output an [Image Artifact](). Each task can be configured to additionally write the generated image to disk by providing either the `output_file` or `output_dir` field. The `output_file` field supports file names in the current directory (`my_image.png`), relative directory prefixes (`images/my_image.png`), or absolute paths (`/usr/var/my_image.png`). By setting `output_dir`, the task will generate a file name and place the image in the requested directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally blank URL for Image Artifact?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No! Good catch, updated.

@@ -0,0 +1,147 @@
## Overview

Image generation engines facilitate the use of [image generation drivers](../structures/image-generation-drivers.md) by image generation tasks and tools. Each image generation engine defines a `run` method that accepts the inputs necessary for each image generation mode, combines these inputs with any available rulesets, and provides the request to the configured image generation driver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize Griptape things like Engines, Drivers, Tasks, Tools, Rulesets throughout docs.


Image generation engines facilitate the use of [image generation drivers](../structures/image-generation-drivers.md) by image generation tasks and tools. Each image generation engine defines a `run` method that accepts the inputs necessary for each image generation mode, combines these inputs with any available rulesets, and provides the request to the configured image generation driver.

#### Rulesets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an H3?

@@ -0,0 +1,147 @@
## Overview

Image generation engines facilitate the use of [image generation drivers](../structures/image-generation-drivers.md) by image generation tasks and tools. Each image generation engine defines a `run` method that accepts the inputs necessary for each image generation mode, combines these inputs with any available rulesets, and provides the request to the configured image generation driver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to reference docs for Image Generation Engines

Comment on lines 32 to 36
engine = PromptImageGenerationEngine(
rulesets=[positive_ruleset],
negative_rulesets=[negative_ruleset],
image_generation_driver=driver,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show running the Engine.

Comment on lines 38 to 41
# Create a tool configured to use the engine.
tool = PromptImageGenerationClient(
image_generation_engine=engine,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to show Tool creation here since we have a dedicated section for Tools.

## Image Generation Tasks

To generate an image, use one of the following Image Generation Tasks. All Image Generation Tasks accept an Image Generation Engine configured to use an [Image Generation Driver](./image-generation-drivers.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to reference docs for Image Generation Task

### Prompt Image Generation Task

The Prompt Image Generation Task generates an image from a text prompt.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference doc

# Create an agent and provide the tool to it.
agent = Agent(tools=[tool])

agent.run("Inpaint a lake to the image at mountain.png using the mask at mask.png.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

External dependency

# Create an agent and provide the tool to it.
agent = Agent(tools=[tool])

agent.run("Outpaint a forest to the image at mountain.png using the mask at mask.png.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

External dependency

# Create an agent and provide the tool to it.
agent = Agent(tools=[tool])

agent.run("Generate a variation of the image located at mountain.png.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

External dependency

Copy link
Member

@SavagePencil SavagePencil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a LOT of code here we'd have to maintain moving forward. Is there a way to minimize that?

@@ -0,0 +1,147 @@
## Overview

Image generation engines facilitate the use of [image generation drivers](../structures/image-generation-drivers.md) by image generation tasks and tools. Each image generation engine defines a `run` method that accepts the inputs necessary for each image generation mode, combines these inputs with any available rulesets, and provides the request to the configured image generation driver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence is monotonous with use of the phrase "Image generation" used three times. Suggest splitting this up into the customer benefit first, followed by how it achieves it (maybe two sentences).


#### Rulesets

[Rulesets](../structures/rulesets.md) provided to image generation engines are combined with prompts, providing further instruction to image generation models. In addition to typical Rulesets, image generation engines support Negative Rulesets. Negative Rulesets are used by [image generation drivers](../structures/image-generation-drivers.md) with support for prompt wieghting and used to influence the image generation model to avoid undesireable features described by negative prompts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, lead with customer benefit/usage to anchor the value for the reader. e.g., "Customers use Negative Rulesets to influence the model to avoid undesirable results, for example by specifying X Y Z.".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, updated.


#### Rulesets

[Rulesets](../structures/rulesets.md) provided to image generation engines are combined with prompts, providing further instruction to image generation models. In addition to typical Rulesets, image generation engines support Negative Rulesets. Negative Rulesets are used by [image generation drivers](../structures/image-generation-drivers.md) with support for prompt wieghting and used to influence the image generation model to avoid undesireable features described by negative prompts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also may want to run this through a spell check. I discovered that I am unable to spell "undesirable" without a lot of help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I get for trying VSCode. Back to PyCharm!

Comment on lines 13 to 34
```python
from griptape.structures import Agent
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import AmazonBedrockImageGenerationDriver, \
BedrockStableDiffusionImageGenerationModelDriver
from griptape.tools import PromptImageGenerationClient


# Define positive and negative rulesets.
positive_ruleset = Ruleset(rules=[Rule("realistic"), Rule("high quality")])
negative_ruleset = Ruleset(rules=[Rule("distorted")])

# Create a driver configured to use Stable Diffusion via Bedrock.
driver = AmazonBedrockImageGenerationDriver(
image_generation_model_driver=BedrockStableDiffusionImageGenerationModelDriver(),
model="stability.stable-diffusion-xl-v0",
)

# Create an engine configured to use the driver.
engine = PromptImageGenerationEngine(
rulesets=[positive_ruleset],
negative_rulesets=[negative_ruleset],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a lot of code, which means a lot to maintain if we make refactors or upstream changes. Are we able to automate testing it? Should we pare it down to only a handful of lines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently do automate testing this, see tests/integration/test_code_snippets.py. Unfortunately that means we need the boilerplate dependency instantiation because this is real code that gets executed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewfrench can you try creating a tests/assets/ directory to see if the code snippets can pull resources from there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! The LLM looks happy to pull from there.


### Outpainting Image Generation Engine

This image generation engine facilitates image outpainting, or modifying an input image according to a text prompt outside the bounds of a mask defined by a mask image.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 126 to 127
# Image data in Image Artifact will be in JPG format
image_artifact_jpg = ImageLoader(format="JPG").load("my_image.png")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is the override behavior, can we include another line that loads it "normal-like"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default example is above

from griptape.tools import PromptImageGenerationClient, FileManager

driver = OpenAiDalleImageGenerationDriver(
model="dall-e-3",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open Q: since Dall-E 3 requires a separate monthly subscription, would it be more accessible to start with Dall-E 2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples aren't prescriptive, but I updated this to dall-e-2 because the Azure driver using our deployment requires dall-e-3 and the downgrade here will save us a bit when running integration tests.

Comment on lines 35 to 36
This model driver supports negative prompts. When provided (for example, when used with an [image generation engine](../data/image-generation-engines.md) configured with negative rulesets), the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to illustrate the negative prompts in action? Perhaps one run without, one with?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example including negative rules

To generate an image, use one of the following Image Generation Tasks. All Image Generation Tasks accept an Image Generation Engine configured to use an [Image Generation Driver](./image-generation-drivers.md).

All successful Image Generation Tasks will always output an [Image Artifact](). Each task can be configured to additionally write the generated image to disk by providing either the `output_file` or `output_dir` field. The `output_file` field supports file names in the current directory (`my_image.png`), relative directory prefixes (`images/my_image.png`), or absolute paths (`/usr/var/my_image.png`). By setting `output_dir`, the task will generate a file name and place the image in the requested directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing URL?

@collindutter
Copy link
Member

There is a LOT of code here we'd have to maintain moving forward. Is there a way to minimize that?

@SavagePencil I think we should encourage lots of examples in our docs as long as they are testable with the integration tests.

Comment on lines 107 to 109
model="dall-e-3",
azure_deployment="my-azure-deployment",
azure_endpoint="https://example-endpoint.openai.azure.com",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load from environment variables.

from griptape.drivers import LeonardoImageGenerationDriver

driver = LeonardoImageGenerationDriver(
model="6bef9f1b-29cb-40c7-b9df-32b51c1f67d3",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load from environment variable


driver = LeonardoImageGenerationDriver(
model="6bef9f1b-29cb-40c7-b9df-32b51c1f67d3",
api_key=os.getenv("LEONARDO_API_KEY"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to .github/workflows/integration-tests.yml vars.

@collindutter collindutter dismissed SavagePencil’s stale review January 11, 2024 17:56

Re-reviewed on a call, good to merge.

@collindutter collindutter merged commit 97e7d20 into dev Jan 11, 2024
@collindutter collindutter deleted the french/240110_image-generation branch January 11, 2024 17:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add documentation for ImageGenerationTask
4 participants