Image generation documentation #193

andrewfrench · 2024-01-10T19:30:55Z

This will be ready to review/merge once pending image generation PRs are merged.

Add docs for:

Image generation drivers
Image generation model drivers
Image generation engine
Image generation tasks (generation, variation, inpainting, outpainting)
Image generation tool
Image loader

Resolves #182

📚 Documentation preview 📚: https://griptape--193.org.readthedocs.build/en/193/

cjkindel · 2024-01-10T21:27:12Z

docs/griptape-framework/data/image-generation-engines.md

+
+#### Rulesets
+
+[Rulesets](../structures/rulesets.md) provided to image generation engines are combined with prompts, providing further instruction to image generation models. In addition to typical Rulesets, image generation engines support Negative Rulesets. Negative Rulesets are used by [image generation drivers](../structures/image-generation-drivers.md) with support for prompt wieghting and used to influence the image generation model to avoid undesireable features described by negative prompts.


wieghting -> weighting
undesireable -> undesirable

cjkindel · 2024-01-10T21:30:05Z

docs/griptape-framework/structures/image-generation-drivers.md

+
+#### Bedrock Stable Diffusion Model Driver
+
+The Bedrock Stable Diffusion model driver provides support for Stable Diffusion models hosted by Amazon Bedrock. This model driver supports configurations specific to Stable Diffusion, like style presets, clip guidance presets, sampler, and more.


nit: , and more may be unnecessary after already qualifying list as incomplete with like ...

cjkindel · 2024-01-10T21:35:30Z

docs/griptape-framework/structures/tasks.md

+To generate an image, use one of the following Image Generation Tasks. All Image Generation Tasks accept an Image Generation Engine configured to use an [Image Generation Driver](./image-generation-drivers.md).
+
+All successful Image Generation Tasks will always output an [Image Artifact](). Each task can be configured to additionally write the generated image to disk by providing either the `output_file` or `output_dir` field. The `output_file` field supports file names in the current directory (`my_image.png`), relative directory prefixes (`images/my_image.png`), or absolute paths (`/usr/var/my_image.png`). By setting `output_dir`, the task will generate a file name and place the image in the requested directory.


Intentionally blank URL for Image Artifact?

No! Good catch, updated.

collindutter · 2024-01-10T21:23:11Z

docs/griptape-framework/data/image-generation-engines.md

@@ -0,0 +1,147 @@
+## Overview
+
+Image generation engines facilitate the use of [image generation drivers](../structures/image-generation-drivers.md) by image generation tasks and tools. Each image generation engine defines a `run` method that accepts the inputs necessary for each image generation mode, combines these inputs with any available rulesets, and provides the request to the configured image generation driver.


Capitalize Griptape things like Engines, Drivers, Tasks, Tools, Rulesets throughout docs.

collindutter · 2024-01-10T21:23:49Z

docs/griptape-framework/data/image-generation-engines.md

+
+Image generation engines facilitate the use of [image generation drivers](../structures/image-generation-drivers.md) by image generation tasks and tools. Each image generation engine defines a `run` method that accepts the inputs necessary for each image generation mode, combines these inputs with any available rulesets, and provides the request to the configured image generation driver.
+
+#### Rulesets


Should this be an H3?

collindutter · 2024-01-10T21:24:30Z

docs/griptape-framework/data/image-generation-engines.md

@@ -0,0 +1,147 @@
+## Overview
+
+Image generation engines facilitate the use of [image generation drivers](../structures/image-generation-drivers.md) by image generation tasks and tools. Each image generation engine defines a `run` method that accepts the inputs necessary for each image generation mode, combines these inputs with any available rulesets, and provides the request to the configured image generation driver.


Link to reference docs for Image Generation Engines

collindutter · 2024-01-10T21:28:01Z

docs/griptape-framework/data/image-generation-engines.md

+engine = PromptImageGenerationEngine(
+    rulesets=[positive_ruleset],
+    negative_rulesets=[negative_ruleset],
+    image_generation_driver=driver,
+)


Show running the Engine.

collindutter · 2024-01-10T21:28:31Z

docs/griptape-framework/data/image-generation-engines.md

+# Create a tool configured to use the engine.
+tool = PromptImageGenerationClient(
+    image_generation_engine=engine,
+)


I don't think we need to show Tool creation here since we have a dedicated section for Tools.

collindutter · 2024-01-10T21:36:50Z

docs/griptape-framework/structures/tasks.md

+## Image Generation Tasks
+
+To generate an image, use one of the following Image Generation Tasks. All Image Generation Tasks accept an Image Generation Engine configured to use an [Image Generation Driver](./image-generation-drivers.md).


Link to reference docs for Image Generation Task

collindutter · 2024-01-10T21:36:57Z

docs/griptape-framework/structures/tasks.md

+### Prompt Image Generation Task
+
+The Prompt Image Generation Task generates an image from a text prompt.


Reference doc

collindutter · 2024-01-10T21:37:24Z

docs/griptape-tools/official-tools/inpainting-image-generation-client.md

+# Create an agent and provide the tool to it.
+agent = Agent(tools=[tool])
+
+agent.run("Inpaint a lake to the image at mountain.png using the mask at mask.png.")


External dependency

collindutter · 2024-01-10T21:37:33Z

docs/griptape-tools/official-tools/outpainting-image-generation-client.md

+# Create an agent and provide the tool to it.
+agent = Agent(tools=[tool])
+
+agent.run("Outpaint a forest to the image at mountain.png using the mask at mask.png.")


External dependency

collindutter · 2024-01-10T21:37:46Z

docs/griptape-tools/official-tools/variation-image-generation-client.md

+# Create an agent and provide the tool to it.
+agent = Agent(tools=[tool])
+
+agent.run("Generate a variation of the image located at mountain.png.")


External dependency

SavagePencil

There is a LOT of code here we'd have to maintain moving forward. Is there a way to minimize that?

SavagePencil · 2024-01-10T21:37:27Z

docs/griptape-framework/data/image-generation-engines.md

@@ -0,0 +1,147 @@
+## Overview
+
+Image generation engines facilitate the use of [image generation drivers](../structures/image-generation-drivers.md) by image generation tasks and tools. Each image generation engine defines a `run` method that accepts the inputs necessary for each image generation mode, combines these inputs with any available rulesets, and provides the request to the configured image generation driver.


this sentence is monotonous with use of the phrase "Image generation" used three times. Suggest splitting this up into the customer benefit first, followed by how it achieves it (maybe two sentences).

SavagePencil · 2024-01-10T21:41:36Z

docs/griptape-framework/data/image-generation-engines.md

+
+#### Rulesets
+
+[Rulesets](../structures/rulesets.md) provided to image generation engines are combined with prompts, providing further instruction to image generation models. In addition to typical Rulesets, image generation engines support Negative Rulesets. Negative Rulesets are used by [image generation drivers](../structures/image-generation-drivers.md) with support for prompt wieghting and used to influence the image generation model to avoid undesireable features described by negative prompts.


Again, lead with customer benefit/usage to anchor the value for the reader. e.g., "Customers use Negative Rulesets to influence the model to avoid undesirable results, for example by specifying X Y Z.".

Good call, updated.

SavagePencil · 2024-01-10T21:42:04Z

docs/griptape-framework/data/image-generation-engines.md

+
+#### Rulesets
+
+[Rulesets](../structures/rulesets.md) provided to image generation engines are combined with prompts, providing further instruction to image generation models. In addition to typical Rulesets, image generation engines support Negative Rulesets. Negative Rulesets are used by [image generation drivers](../structures/image-generation-drivers.md) with support for prompt wieghting and used to influence the image generation model to avoid undesireable features described by negative prompts.


Also may want to run this through a spell check. I discovered that I am unable to spell "undesirable" without a lot of help.

This is what I get for trying VSCode. Back to PyCharm!

SavagePencil · 2024-01-10T21:43:24Z

docs/griptape-framework/data/image-generation-engines.md

+```python
+from griptape.structures import Agent
+from griptape.engines import PromptImageGenerationEngine
+from griptape.drivers import AmazonBedrockImageGenerationDriver, \
+    BedrockStableDiffusionImageGenerationModelDriver
+from griptape.tools import PromptImageGenerationClient
+
+
+# Define positive and negative rulesets.
+positive_ruleset = Ruleset(rules=[Rule("realistic"), Rule("high quality")])
+negative_ruleset = Ruleset(rules=[Rule("distorted")])
+
+# Create a driver configured to use Stable Diffusion via Bedrock.
+driver = AmazonBedrockImageGenerationDriver(
+    image_generation_model_driver=BedrockStableDiffusionImageGenerationModelDriver(),
+    model="stability.stable-diffusion-xl-v0",
+)
+
+# Create an engine configured to use the driver.
+engine = PromptImageGenerationEngine(
+    rulesets=[positive_ruleset],
+    negative_rulesets=[negative_ruleset],


this is a lot of code, which means a lot to maintain if we make refactors or upstream changes. Are we able to automate testing it? Should we pare it down to only a handful of lines?

We currently do automate testing this, see tests/integration/test_code_snippets.py. Unfortunately that means we need the boilerplate dependency instantiation because this is real code that gets executed.

@andrewfrench can you try creating a tests/assets/ directory to see if the code snippets can pull resources from there?

done! The LLM looks happy to pull from there.

docs/griptape-framework/data/image-generation-engines.md

SavagePencil · 2024-01-10T21:47:13Z

docs/griptape-framework/data/image-generation-engines.md

+
+### Outpainting Image Generation Engine
+
+This image generation engine facilitates image outpainting, or modifying an input image according to a text prompt outside the bounds of a mask defined by a mask image.


SavagePencil · 2024-01-10T21:48:41Z

docs/griptape-framework/data/loaders.md

+# Image data in Image Artifact will be in JPG format
+image_artifact_jpg = ImageLoader(format="JPG").load("my_image.png")


since this is the override behavior, can we include another line that loads it "normal-like"

The default example is above

SavagePencil · 2024-01-10T21:49:35Z

docs/griptape-framework/structures/image-generation-drivers.md

+from griptape.tools import PromptImageGenerationClient, FileManager
+
+driver = OpenAiDalleImageGenerationDriver(
+    model="dall-e-3",


Open Q: since Dall-E 3 requires a separate monthly subscription, would it be more accessible to start with Dall-E 2?

These examples aren't prescriptive, but I updated this to dall-e-2 because the Azure driver using our deployment requires dall-e-3 and the downgrade here will save us a bit when running integration tests.

SavagePencil · 2024-01-10T21:51:55Z

docs/griptape-framework/structures/image-generation-drivers.md

+This model driver supports negative prompts. When provided (for example, when used with an [image generation engine](../data/image-generation-engines.md) configured with negative rulesets), the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.
+


do we want to illustrate the negative prompts in action? Perhaps one run without, one with?

Added an example including negative rules

SavagePencil · 2024-01-10T21:52:40Z

docs/griptape-framework/structures/tasks.md

+To generate an image, use one of the following Image Generation Tasks. All Image Generation Tasks accept an Image Generation Engine configured to use an [Image Generation Driver](./image-generation-drivers.md).
+
+All successful Image Generation Tasks will always output an [Image Artifact](). Each task can be configured to additionally write the generated image to disk by providing either the `output_file` or `output_dir` field. The `output_file` field supports file names in the current directory (`my_image.png`), relative directory prefixes (`images/my_image.png`), or absolute paths (`/usr/var/my_image.png`). By setting `output_dir`, the task will generate a file name and place the image in the requested directory.


missing URL?

collindutter · 2024-01-10T22:20:39Z

There is a LOT of code here we'd have to maintain moving forward. Is there a way to minimize that?

@SavagePencil I think we should encourage lots of examples in our docs as long as they are testable with the integration tests.

collindutter · 2024-01-10T23:53:00Z

docs/griptape-framework/structures/image-generation-drivers.md

+    model="dall-e-3",
+    azure_deployment="my-azure-deployment",
+    azure_endpoint="https://example-endpoint.openai.azure.com",


Load from environment variables.

collindutter · 2024-01-10T23:54:06Z

docs/griptape-framework/structures/image-generation-drivers.md

+from griptape.drivers import LeonardoImageGenerationDriver
+
+driver = LeonardoImageGenerationDriver(
+    model="6bef9f1b-29cb-40c7-b9df-32b51c1f67d3",


Load from environment variable

collindutter · 2024-01-10T23:54:24Z

docs/griptape-framework/structures/image-generation-drivers.md

+
+driver = LeonardoImageGenerationDriver(
+    model="6bef9f1b-29cb-40c7-b9df-32b51c1f67d3",
+    api_key=os.getenv("LEONARDO_API_KEY"),


Add to .github/workflows/integration-tests.yml vars.

Re-reviewed on a call, good to merge.

andrewfrench added 5 commits January 10, 2024 10:22

Image generation tasks

4c29116

Image loader

3161bf8

Image generation drivers

0d1063a

Fix image gen task docs

5d0ad5f

Image gen tools

52f74ef

andrewfrench mentioned this pull request Jan 10, 2024

Add docs on image generation drivers and model drivers #174

Closed

6 tasks

andrewfrench marked this pull request as draft January 10, 2024 19:32

andrewfrench added 2 commits January 10, 2024 11:57

Image generation engines

a755ed4

notes on negative prompts

70b4471

andrewfrench requested review from collindutter and a team January 10, 2024 21:23

andrewfrench marked this pull request as ready for review January 10, 2024 21:23

cjkindel reviewed Jan 10, 2024

View reviewed changes

Fix snippets, spelling, ImageArtifact docs, etc.

d13a5fc

collindutter suggested changes Jan 10, 2024

View reviewed changes

SavagePencil previously requested changes Jan 10, 2024

View reviewed changes

review updates

823907c

collindutter reviewed Jan 10, 2024

View reviewed changes

andrewfrench added 6 commits January 10, 2024 16:13

Review comments, OpenAi image gen driver rename

2ee3a67

Include rulesets in engine example

01eef0f

Address review comments

3853f91

Link ImageArtifact reference docs

eae7cb5

Review comments

9cfaf60

Env vars

02bfef4

collindutter approved these changes Jan 11, 2024

View reviewed changes

collindutter merged commit 97e7d20 into dev Jan 11, 2024

collindutter deleted the french/240110_image-generation branch January 11, 2024 17:56


		#### Rulesets

		[Rulesets](../structures/rulesets.md) provided to image generation engines are combined with prompts, providing further instruction to image generation models. In addition to typical Rulesets, image generation engines support Negative Rulesets. Negative Rulesets are used by [image generation drivers](../structures/image-generation-drivers.md) with support for prompt wieghting and used to influence the image generation model to avoid undesireable features described by negative prompts.


		#### Bedrock Stable Diffusion Model Driver

		The Bedrock Stable Diffusion model driver provides support for Stable Diffusion models hosted by Amazon Bedrock. This model driver supports configurations specific to Stable Diffusion, like style presets, clip guidance presets, sampler, and more.

		To generate an image, use one of the following Image Generation Tasks. All Image Generation Tasks accept an Image Generation Engine configured to use an [Image Generation Driver](./image-generation-drivers.md).

		All successful Image Generation Tasks will always output an [Image Artifact](). Each task can be configured to additionally write the generated image to disk by providing either the `output_file` or `output_dir` field. The `output_file` field supports file names in the current directory (`my_image.png`), relative directory prefixes (`images/my_image.png`), or absolute paths (`/usr/var/my_image.png`). By setting `output_dir`, the task will generate a file name and place the image in the requested directory.

		@@ -0,0 +1,147 @@
		## Overview

		Image generation engines facilitate the use of [image generation drivers](../structures/image-generation-drivers.md) by image generation tasks and tools. Each image generation engine defines a `run` method that accepts the inputs necessary for each image generation mode, combines these inputs with any available rulesets, and provides the request to the configured image generation driver.

		## Image Generation Tasks

		To generate an image, use one of the following Image Generation Tasks. All Image Generation Tasks accept an Image Generation Engine configured to use an [Image Generation Driver](./image-generation-drivers.md).

		### Prompt Image Generation Task

		The Prompt Image Generation Task generates an image from a text prompt.


		### Outpainting Image Generation Engine

		This image generation engine facilitates image outpainting, or modifying an input image according to a text prompt outside the bounds of a mask defined by a mask image.

		# Image data in Image Artifact will be in JPG format
		image_artifact_jpg = ImageLoader(format="JPG").load("my_image.png")

		This model driver supports negative prompts. When provided (for example, when used with an [image generation engine](../data/image-generation-engines.md) configured with negative rulesets), the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.

Image generation documentation #193

Image generation documentation #193

Conversation

andrewfrench commented Jan 10, 2024 • edited Loading

Choose a reason for hiding this comment

cjkindel Jan 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SavagePencil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

collindutter commented Jan 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewfrench commented Jan 10, 2024 •

edited

Loading

cjkindel Jan 10, 2024 •

edited

Loading