Skip to content

Code and data of "Controllable Unsupervised Event-based Video Generation" (accepted as ICIP oral and invited by WACV workshop)

License

Notifications You must be signed in to change notification settings

IndigoPurple/CUBE

Repository files navigation

CUBE

In this repository we provide code of the paper:

Controllable Unsupervised Event-Based Video Generation (accepted as ICIP oral and invited by WACV workshop)

Yaping Zhao, Pei Zhang, Chutian Wang, Edmund Y. Lam

paper link: https://ieeexplore.ieee.org/abstract/document/10647468

Usage

1. Download Weight

All pre-trained weights are downloaded to checkpoints/ directory, including the pre-trained weights of Stable Diffusion v1.5, ControlNet conditioned on canny edges. The flownet.pkl is the weights of RIFE. The final file tree likes:

checkpoints
├── stable-diffusion-v1-5
├── sd-controlnet-canny
├── flownet.pkl

2. Requirements

conda create -n cube python=3.10
conda activate cube
pip install -r requirements.txt

xformers is recommended to save memory and running time.

3. Inference

3.1. Edge Extraction

Event streams used in our experiments are provided in the data/event folder.

To extract edges from the event data, simply run:

python edge_extraction.py

The extracted edges could be found in the data\edge folder.

3.2. Video Generation

To generate videos as shown in our paper, simply run:

sh inference_moonwalk.sh
sh inference_violin.sh
sh inference_sofa.sh
sh inference_man.sh
sh inference_girl.sh

To run you own experiment on text-to-video generation, modify the bash, e.g., inference_moonwalk.sh:

python inference.py \
    --prompt "James bond does the moonwalk on the desert." \
    --condition "canny" \
    --video_path "data/moonwalk.mp4" \
    --output_path "outputs/" \
    --video_length 15 \
    --smoother_steps 19 20 \
    --width 512 \
    --height 512
    #--is_long_video

where --video_length is the length of synthesized video, --condition represents the type of structure sequence, --smoother_steps determines at which timesteps to perform smoothing, and --is_long_video denotes whether to enable efficient long-video synthesis.

The generated videos could be found in the outputs folder.

Visualizations

Event Edge
"James bond does the moonwalk on the desert." "An astronaut does the moonwalk on the moon." "Iron man does the moonwalk on the road."
Event Edge
"An old man wearing a glass, cartoon." "An old man wearing a glass, laughing." "An old man wearing a glass, oil painting."
Event Edge
"A blue sofa in a house." "A green sofa in a house." "A modern sofa in a house."
Event Edge
"A girl with golden hair, crying." "A girl with golden hair, smiling." "A girl with long hair, movie style."

Citation

Cite our paper if you find it interesting!

@INPROCEEDINGS{zhao2024controllable,
  author={Zhao, Yaping and Zhang, Pei and Wang, Chutian and Lam, Edmund Y.},
  booktitle={IEEE International Conference on Image Processing (ICIP)}, 
  title={Controllable Unsupervised Event-Based Video Generation}, 
  year={2024},
  pages={2278-2284},
  doi={10.1109/ICIP51287.2024.10647468}}

This code is implemented based on ControlVideo.

About

Code and data of "Controllable Unsupervised Event-based Video Generation" (accepted as ICIP oral and invited by WACV workshop)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published