Skip to content

Latest commit

 

History

History
202 lines (152 loc) · 5.93 KB

README.md

File metadata and controls

202 lines (152 loc) · 5.93 KB

Official PyTorch implementation of MOVAD, paper accepted to ICASSP 2024.

We propose MOVAD, a brand new architecture for online (frame-level) video anomaly detection.

PWC

MOVAD Architurecture

Authors: Leonardo Rossi, Vittorio Bernuzzi, Tomaso Fontanini, Massimo Bertozzi, Andrea Prati.

IMP Lab - Dipartimento di Ingegneria e Architettura

University of Parma, Italy

Video Anomaly Detection with MOVAD

Abstract

The ability to understand the surrounding scene is of paramount importance for Autonomous Vehicles (AVs).

This paper presents a system capable to work in an online fashion, giving an immediate response to the arise of anomalies surrounding the AV, exploiting only the videos captured by a dash-mounted camera.

Our architecture, called MOVAD, relies on two main modules: a Short-Term Memory Module to extract information related to the ongoing action, implemented by a Video Swin Transformer (VST), and a Long-Term Memory Module injected inside the classifier that considers also remote past information and action context thanks to the use of a Long-Short Term Memory (LSTM) network.

The strengths of MOVAD are not only linked to its excellent performance, but also to its straightforward and modular architecture, trained in a end-to-end fashion with only RGB frames with as less assumptions as possible, which makes it easy to implement and play with.

We evaluated the performance of our method on Detection of Traffic Anomaly (DoTA) dataset, a challenging collection of dash-mounted camera videos of accidents.

After an extensive ablation study, MOVAD is able to reach an AUC score of 82.17%, surpassing the current state-of-the-art by $+2.87$ AUC.

Usage

Installation

$ git clone https://github.com/IMPLabUniPr/movad/tree/movad_vad
$ cd movad
$ wget https://github.com/SwinTransformer/storage/releases/download/v1.0.4/swin_base_patch244_window1677_sthv2.pth -O pretrained/swin_base_patch244_window1677_sthv2.pth
$ conda env create -n movad_env --file environment.yml
$ conda activate movad_env

Download DoTa dataset

Please download from official website the dataset and save inside data/dota directory.

You should obtain the following structure:

data/dota
├── annotations
│   ├── 0qfbmt4G8Rw_000306.json
│   ├── 0qfbmt4G8Rw_000435.json
│   ├── 0qfbmt4G8Rw_000602.json
│   ...
├── frames
│   ├── 0qfbmt4G8Rw_000072
│   ├── 0qfbmt4G8Rw_000306
│   ├── 0qfbmt4G8Rw_000435
│   .... 
└── metadata
    ├── metadata_train.json
    ├── metadata_val.json
    ├── train_split.txt
    └── val_split.txt

Download pretrained on DoTA dataset

Open Release v1.0 page and download .pt (pretrained) and .pkl (results) file. Unzip them inside the output directory, obtaining the following directories structure:

output/
├── v4_1
│   ├── checkpoints
│   │   └── model-640.pt
│   └── eval
│       └── results-640.pkl
└── v4_2
    ├── checkpoints
    │   └── model-690.pt
    └── eval
        └── results-690.pkl

Train

python src/main.py --config cfgs/v4_2.yml --output output/v4_2/ --phase train --epochs 1000 --epoch -1

Eval

python src/main.py --config cfgs/v4_2.yml --output output/v4_2/ --phase test --epoch 690

Play: generate video

python src/main.py --config cfgs/v4_2.yml --output output/v4_2/ --phase play --epoch 690

Results

Table 1

Memory modules effectiveness.

# Short-term Long-term AUC Conf
1 66.53 conf
2 X 74.46 conf
3 X 68.76 conf
4 X X 79.21 conf

Figure 2

Short-term memory module.

Name Conf
NF 1 conf
NF 2 conf
NF 3 conf
NF 4 conf
NF 5 conf

Figure 3

Long-term memory module.

Name Conf
w/out LSTM conf
LSTM (1 cell) conf
LSTM (2 cells) conf
LSTM (3 cells) conf
LSTM (4 cells) conf

Figure 4

Video clip length (VCL).

Name Conf
4 frames conf
8 frames conf
12 frames conf
16 frames conf

Table 2

Comparison with the state of the art.

# Method Input AUC Conf
9 Our (MOVAD) RGB (320x240) 80.09 conf
10 Our (MOVAD) RGB (640x480) 82.17 conf

License

See GPL v2 License.

Acknowledgement

This research benefits from the HPC (High Performance Computing) facility of the University of Parma, Italy.

Citation

If you find our work useful in your research, please cite:

@inproceedings{rossi2024memory,
  title={Memory-augmented Online Video Anomaly Detection},
  author={Rossi, Leonardo and Bernuzzi, Vittorio and Fontanini, Tomaso and Bertozzi, Massimo and Prati, Andrea},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={6590--6594},
  year={2024},
  organization={IEEE}
}