Skip to content

Commit

Permalink
Update sd-v2-infinite-zoom to genai usage (#2731)
Browse files Browse the repository at this point in the history
CVS-161647
  • Loading branch information
aleksandr-mokrov authored Feb 7, 2025
1 parent e3eef01 commit 422fa4a
Showing 1 changed file with 65 additions and 20 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"* The model comes with a new refined depth architecture capable of preserving context from prior generation layers in an image-to-image setting. This structure preservation helps generate images that preserving forms and shadow of objects, but with different content.\n",
"* The model comes with an updated inpainting module built upon the previous model. This text-guided inpainting makes switching out parts in the image easier than before.\n",
"\n",
"This notebook demonstrates how to download the model from the Hugging Face Hub and converted to OpenVINO IR format with [Optimum Intel](https://huggingface.co/docs/optimum/intel/inference#stable-diffusion). And how to use the model to generate sequence of images for infinite zoom video effect.\n",
"This notebook demonstrates how to download the model from the Hugging Face Hub and convert to OpenVINO IR format with the [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library. And how to use the model to generate sequence of images for infinite zoom video effect using [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai) that provides easy-to-use API.\n",
"\n",
"\n",
"<img referrerpolicy=\"no-referrer-when-downgrade\" src=\"https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/stable-diffusion-v2/stable-diffusion-v2-infinite-zoom.ipynb\" />\n"
Expand Down Expand Up @@ -103,7 +103,8 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install -q \"diffusers>=0.14.0\" \"transformers>=4.25.1\" \"gradio>=4.19\" \"openvino>=2024.2.0\" \"torch>=2.1\" Pillow opencv-python \"git+https://github.com/huggingface/optimum-intel.git\" --extra-index-url https://download.pytorch.org/whl/cpu"
"%pip install -q -U \"openvino>=2025.0\" \"openvino-genai>=2025.0\"\n",
"%pip install -q \"diffusers>=0.14.0\" \"transformers>=4.25.1\" \"gradio>=4.19\" \"torch>=2.1\" Pillow opencv-python \"git+https://github.com/huggingface/optimum-intel.git\" --extra-index-url https://download.pytorch.org/whl/cpu"
]
},
{
Expand All @@ -115,9 +116,27 @@
"## Load Stable Diffusion Inpaint pipeline using Optimum Intel\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"\n",
"We will load optimized Stable Diffusion model from the Hugging Face Hub and create pipeline to run an inference with OpenVINO Runtime by [Optimum Intel](https://huggingface.co/docs/optimum/intel/inference#stable-diffusion). \n",
"[stable-diffusion-2-inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) is available for downloading via the [HuggingFace hub](https://huggingface.co/models). We will use optimum-cli interface for exporting it into OpenVINO Intermediate Representation (IR) format.\n",
"\n",
"For running the Stable Diffusion model with Optimum Intel, we will use the optimum.intel.OVStableDiffusionInpaintPipeline class, which represents the inference pipeline. OVStableDiffusionInpaintPipeline initialized by the from_pretrained method. It supports on-the-fly conversion models from PyTorch using the export=True parameter. A converted model can be saved on disk using the save_pretrained method for the next running. \n",
" Optimum CLI interface for converting models supports export to OpenVINO (supported starting optimum-intel 1.12 version).\n",
"General command format:\n",
"\n",
"```bash\n",
"optimum-cli export openvino --model <model_id_or_path> --task <task> <output_dir>\n",
"```\n",
"\n",
"where `task` is the task to export the model for, if not specified, the task will be auto-inferred based on the model.\n",
"\n",
"You can find a mapping between tasks and model classes in Optimum TaskManager [documentation](https://huggingface.co/docs/optimum/exporters/task_manager).\n",
"\n",
"Additionally, you can specify weights compression `--weight-format` for the model compression. Please note, that for INT8/INT4, it is necessary to install nncf.\n",
"\n",
"Full list of supported arguments available via `--help`\n",
"For more details and examples of usage, please check [optimum documentation](https://huggingface.co/docs/optimum/intel/inference#export).\n",
"\n",
"\n",
"For running the Stable Diffusion model, we will use [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai) that provides easy-to-use API for running text generation. Firstly we will create pipeline with `InpaintingPipeline`. You can see more details in [Image Python Generation Pipeline Example](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2025/0/samples/python/image_generation#run-inpainting-pipeline).\n",
"Then we run the `generate` method and get the image tokens and then convert them into the image using `Image.fromarray` from PIL. Also we convert the input images to `ov.Tensor` using `image_to_tensor` function. \n",
"\n",
"Select device from dropdown list for running inference using OpenVINO."
]
Expand All @@ -138,6 +157,10 @@
" )\n",
" open(\"notebook_utils.py\", \"w\").write(r.text)\n",
"\n",
"if not Path(\"cmd_helper.py\").exists():\n",
" r = requests.get(url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/cmd_helper.py\")\n",
" open(\"cmd_helper.py\", \"w\").write(r.text)\n",
"\n",
"# Read more about telemetry collection at https://github.com/openvinotoolkit/openvino_notebooks?tab=readme-ov-file#-telemetry\n",
"from notebook_utils import collect_telemetry\n",
"\n",
Expand All @@ -157,21 +180,28 @@
"metadata": {},
"outputs": [],
"source": [
"from optimum.intel.openvino import OVStableDiffusionInpaintPipeline\n",
"from pathlib import Path\n",
"import openvino as ov\n",
"\n",
"from cmd_helper import optimum_cli\n",
"\n",
"DEVICE = device.value\n",
"\n",
"MODEL_ID = \"stabilityai/stable-diffusion-2-inpainting\"\n",
"MODEL_DIR = Path(\"sd2_inpainting\")\n",
"\n",
"if not MODEL_DIR.exists():\n",
" ov_pipe = OVStableDiffusionInpaintPipeline.from_pretrained(MODEL_ID, export=True, device=DEVICE, compile=False)\n",
" ov_pipe.save_pretrained(MODEL_DIR)\n",
"else:\n",
" ov_pipe = OVStableDiffusionInpaintPipeline.from_pretrained(MODEL_DIR, device=DEVICE, compile=False)\n",
"optimum_cli(MODEL_ID, MODEL_DIR, additional_args={\"weight-format\": \"fp16\"})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a424af25",
"metadata": {},
"outputs": [],
"source": [
"import openvino_genai as ov_genai\n",
"\n",
"ov_pipe.compile()"
"\n",
"pipe = ov_genai.InpaintingPipeline(MODEL_DIR, device.value)"
]
},
{
Expand All @@ -184,7 +214,7 @@
"[back to top ⬆️](#Table-of-contents:)\n",
"\n",
"For achieving zoom effect, we will use inpainting to expand images beyond their original borders.\n",
"We run our `OVStableDiffusionInpaintPipeline` in the loop, where each next frame will add edges to previous. The frame generation process illustrated on diagram below:\n",
"We run our `InpaintingPipeline` in the loop, where each next frame will add edges to previous. The frame generation process illustrated on diagram below:\n",
"\n",
"![frame generation)](https://user-images.githubusercontent.com/29454499/228739686-436f2759-4c79-42a2-a70f-959fb226834c.png)\n",
"\n",
Expand All @@ -208,11 +238,18 @@
"from typing import List, Union\n",
"\n",
"import PIL\n",
"from PIL import Image\n",
"import cv2\n",
"from tqdm import trange\n",
"import numpy as np\n",
"\n",
"\n",
"def image_to_tensor(image: Image) -> ov.Tensor:\n",
" pic = image.convert(\"RGB\")\n",
" image_data = np.array(pic.getdata()).reshape(1, pic.size[1], pic.size[0], 3).astype(np.uint8)\n",
" return ov.Tensor(image_data)\n",
"\n",
"\n",
"def generate_video(\n",
" pipe,\n",
" prompt: Union[str, List[str]],\n",
Expand Down Expand Up @@ -251,14 +288,19 @@
" mask_image = np.array(current_image)[:, :, 3]\n",
" mask_image = PIL.Image.fromarray(255 - mask_image).convert(\"RGB\")\n",
" current_image = current_image.convert(\"RGB\")\n",
" init_images = pipe(\n",
" current_image = image_to_tensor(current_image)\n",
" mask_image = image_to_tensor(mask_image)\n",
" image_tensors = pipe.generate(\n",
" prompt=prompt,\n",
" negative_prompt=negative_prompt,\n",
" image=current_image,\n",
" guidance_scale=guidance_scale,\n",
" mask_image=mask_image,\n",
" num_inference_steps=num_inference_steps,\n",
" ).images\n",
" )\n",
" init_images = []\n",
" for image_tensor in image_tensors.data:\n",
" init_images.append(PIL.Image.fromarray(image_tensor))\n",
"\n",
" image_grid(init_images, rows=1, cols=1)\n",
"\n",
Expand All @@ -284,15 +326,17 @@
"\n",
" # inpainting step\n",
" current_image = current_image.convert(\"RGB\")\n",
" images = pipe(\n",
" current_image = image_to_tensor(current_image)\n",
" mask_image = image_to_tensor(mask_image)\n",
" image_tensor = pipe.generate(\n",
" prompt=prompt,\n",
" negative_prompt=negative_prompt,\n",
" image=current_image,\n",
" guidance_scale=guidance_scale,\n",
" mask_image=mask_image,\n",
" num_inference_steps=num_inference_steps,\n",
" ).images\n",
" current_image = images[0]\n",
" )\n",
" current_image = PIL.Image.fromarray(image_tensor.data[0])\n",
" current_image.paste(prev_image, mask=prev_image)\n",
"\n",
" # interpolation steps bewteen 2 inpainted images (=sequential zoom and crop)\n",
Expand Down Expand Up @@ -321,6 +365,7 @@
" fps = 30\n",
" save_path = video_file_name + \".mp4\"\n",
" write_video(save_path, all_frames, fps, reversed_order=zoom_in)\n",
"\n",
" return save_path"
]
},
Expand Down Expand Up @@ -453,7 +498,7 @@
"\n",
"from gradio_helper import make_demo_zoom_video\n",
"\n",
"demo = make_demo_zoom_video(ov_pipe, generate_video)\n",
"demo = make_demo_zoom_video(pipe, generate_video)\n",
"\n",
"try:\n",
" demo.queue().launch()\n",
Expand Down

0 comments on commit 422fa4a

Please sign in to comment.