In this repository we provide code of the paper:
Controllable Unsupervised Event-Based Video Generation (accepted as ICIP oral and invited by WACV workshop)
Yaping Zhao, Pei Zhang, Chutian Wang, Edmund Y. Lam
paper link: https://ieeexplore.ieee.org/abstract/document/10647468
All pre-trained weights are downloaded to checkpoints/
directory, including the pre-trained weights of Stable Diffusion v1.5, ControlNet conditioned on canny edges.
The flownet.pkl
is the weights of RIFE.
The final file tree likes:
checkpoints
├── stable-diffusion-v1-5
├── sd-controlnet-canny
├── flownet.pkl
conda create -n cube python=3.10
conda activate cube
pip install -r requirements.txt
xformers
is recommended to save memory and running time.
Event streams used in our experiments are provided in the data/event
folder.
To extract edges from the event data, simply run:
python edge_extraction.py
The extracted edges could be found in the data\edge
folder.
To generate videos as shown in our paper, simply run:
sh inference_moonwalk.sh
sh inference_violin.sh
sh inference_sofa.sh
sh inference_man.sh
sh inference_girl.sh
To run you own experiment on text-to-video generation, modify the bash, e.g., inference_moonwalk.sh
:
python inference.py \
--prompt "James bond does the moonwalk on the desert." \
--condition "canny" \
--video_path "data/moonwalk.mp4" \
--output_path "outputs/" \
--video_length 15 \
--smoother_steps 19 20 \
--width 512 \
--height 512
#--is_long_video
where --video_length
is the length of synthesized video, --condition
represents the type of structure sequence,
--smoother_steps
determines at which timesteps to perform smoothing, and --is_long_video
denotes whether to enable efficient long-video synthesis.
The generated videos could be found in the outputs
folder.
Event | Edge |
"James bond does the moonwalk on the desert." | "An astronaut does the moonwalk on the moon." | "Iron man does the moonwalk on the road." |
Event | Edge |
"An old man wearing a glass, cartoon." | "An old man wearing a glass, laughing." | "An old man wearing a glass, oil painting." |
Event | Edge |
"A blue sofa in a house." | "A green sofa in a house." | "A modern sofa in a house." |
Event | Edge |
"A girl with golden hair, crying." | "A girl with golden hair, smiling." | "A girl with long hair, movie style." |
Cite our paper if you find it interesting!
@INPROCEEDINGS{zhao2024controllable,
author={Zhao, Yaping and Zhang, Pei and Wang, Chutian and Lam, Edmund Y.},
booktitle={IEEE International Conference on Image Processing (ICIP)},
title={Controllable Unsupervised Event-Based Video Generation},
year={2024},
pages={2278-2284},
doi={10.1109/ICIP51287.2024.10647468}}
This code is implemented based on ControlVideo.