NeuroClips is a novel framework for fMRI-to-video decoding (NeurIPS 2024 Oral). If you like our project, please give us a star ⭐.
- Dec. 3, 2024. Full Codes release.
- Nov. 30, 2024. Pre-processed code and dataset release.
- Sep. 26, 2024. Accepted by NeurIPS 2024 for Oral Presentation.
- May. 24, 2024. Project release.
We use the public cc2017(Wen) dataset from this. You can download and follow the official preprocess to only deal with your fMRI data. Only use movie_fmri_data_processing.m
and movie_fmri_reproducibility.m
, and notice that the selected voxels(Bonferroni correction, P < 0.05) were more than before(Bonferroni correction, P < 0.01).
We also offer our pre-processed fMRI data and frames sampled from videos for training in NeuroClips, and you can directly download them from .
You can use python src/caption.py
to generate the captions.
We recommend using the virtual environment for Neuroclips training, inference keyframes, and blurry videos separately from the pre-trained T2V diffusion's virtual environment to avoid any conflict issue of different environment package versions.
For Neuroclips:
. src/setup.sh
For pre-trained AnimateDiffusion, you can follow this:
conda create -n animatediff python==3.10
conda activate animatediff
cd AnimateDiff
pip install -r requirements.txt
We suggest training the backbone first and then the prior to achieve better Semantic Reconstructor.
conda activate neuroclips
python src/train_SR.py --subj 1 --batch_size 240 --num_epochs 30 --mixup_pct 1.0 --max_lr 1e-4 --use_text
python src/train_SR.py --subj 1 --batch_size 64 --num_epochs 150 --mixup_pct 0.0 --max_lr 3e-4 --use_prior --use_text
python src/train_PR.py --subj 1 --batch_size 40 --mixup_pct 0.0 --num_epochs 80
python src/recon_keyframe.py --subj 1
After keyframes are generated, you could use BLIP-2:python src/caption.py
to get captions of keyframes.
python src/recon_blurry.py --subj 1
After preparing all the inputs, you can reconstruct the video. You can use any pre-trained T2V or V2V model. We are using the T2V pre-trained model AnimateDiffusion here, specifically SparseCtrl for first-frame guidance.
conda activate animatediff
cd Animatediff
python -m scripts.neuroclips --config configs/NeuroClips/control.yaml
The pre-trained weights you should prepare are in here.
GT | Ours | GT | Ours | GT | Ours |
GT | Ours | GT | Ours | GT | Ours |
GT | Ours | GT | Ours | GT | Ours |
GT | Ours | GT | Ours | GT | Ours |
GT | Ours | GT | Ours | GT | Ours |
GT | Ours | GT | Ours | GT | Ours |
With the help of NeuroClips’ SR, we explored the generation of longer videos for the first time. Since the technical field of long video generation is still immature, we chose a more straightforward fusion strategy that does not require additional GPU training. In the inference process, we consider the semantic similarity of two reconstructed keyframes from two neighboring fMRI samples (here we directly determine whether they belong to the same class of objects, e.g., both are jellyfish). If semantically similar, we replace the keyframe of the latter fMRI with the tail-frame of the former fMRI’s reconstructed video, which will be taken as the first-frame of the latter fMRI to generate the video.
Overall the fail cases can be divided into two categories: on the one hand, the semantics are not accurate enough and on the other hand, the scene transition affects the generated results.
In CC2017 dataset, the video clips in the testing movie were different from those in the training movie, and there were even some categories of objects that didn't appear in the training set. However thanks to NeuroClips' Perceptual Reconstructor, we can still reconstruct the video at a low-level of vision.
GT | Ours | GT | Ours |
Due to the low-temporal resolution of fMRI (i.e., 2s), a segment of fMRI may include two video scenes, leading to semantic confusion in the video reconstruction, or even semantic and perceptual fusion, as shown in the following image of a jellyfish transitioning to the moon, which ultimately generates a jellyfish with a black background.
GT | Ours | GT | Ours |
@article{gong2024neuroclips,
title={NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction},
author={Gong, Zixuan and Bao, Guangyin and Zhang, Qi and Wan, Zhongwei and Miao, Duoqian and Wang, Shoujin and Zhu, Lei and Wang, Changwei and Xu, Rongtao and Hu, Liang and others},
journal={arXiv preprint arXiv:2410.19452},
year={2024}
}
We sincerely thank the following authors, and Neuroclips is based on their excellent open-source projects or impressive ideas.
T2V diffusion: https://github.com/guoyww/AnimateDiff
Excellent Backbone: https://github.com/MedARC-AI/MindEyeV2
Temporal Design: https://arxiv.org/abs/2304.08818
Keyframe Captioning: https://github.com/salesforce/LAVIS/tree/main/projects/blip2
Dataset and Pre-processed code: https://purr.purdue.edu/publications/2809