HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection

# Accepted at WACV 2025 Workshop - AI4MFDD, Tucson, Arizona (US)

Codebase is only adapted to WildRF version.

Overview

Our proposed model is a state-of-the-art deepfake detection architecture that integrates hierarchical feature fusion and multi-stream models. By combining Vision Transformers, ResNet, and specialized modules like Yolov8 and XceptionNet, it excels in detecting high-quality and realistic deepfakes generated by modern Generative AI techniques. The model emphasizes both spatial and semantic understanding while ensuring explainability through Grad-CAM visualizations.

Key Features

Hierarchical Feature Fusion: Combines Vision Transformer (ViT) and ResNet50 outputs for enriched representations.
Multi-Stream Architecture: Incorporates facial features, context, and edge-aware details via Yolov8, Sobel filters, and XceptionNet.
Explainability: Employs Grad-CAM to produce class-discriminative heatmaps for output decisions.
Calibration: Uses Platt Scaling to refine output probabilities, reducing model overconfidence.
Ensemble Decision Making: Aggregates multi-stream predictions for robust and accurate results.

Architecture

The model architecture comprises two main modules:

Module One:
- Hierarchical fusion of ViT and ResNet50 for feature extraction.
- Calibration using Platt Scaling to ensure well-calibrated outputs.
Module Two:
- Multi-stream feature extraction using Yolov8, Sobel filters, and XceptionNet.
- Grad-CAM for visualization and explainability.

Results

Calibration: Achieved an 8% decrease in Expected Calibration Error (ECE) on the training set and a 13% decrease on the validation set.
Explainability: Grad-CAM heatmaps demonstrate accurate localization of manipulated regions.
Ensemble Performance: Outperforms baseline models in detecting deepfakes while maintaining computational efficiency.

Data Preparation as follows:

!pip install gdown
from pathlib import PosixPath
#WildRF
!gdown --id 1A0xoL44Yg68ixd-FuIJn2VC4vdZ6M2gn -c
!unzip -q -n WildRF.zip

#CollabDiff
gdown --id 1GpGvkxQ7leXqCnfnEAsgY_DXFnJwIbO4 -c
!unzip -q -n CollabDiff.zip

Comparison with SOTA on WildRF

Ablation on WildRF

Installation

Clone the repository:

git clone https://github.com/taco-group/HFMF.git
cd Hierarchical-Multi-Stream-Fusion-for-Deepfake-Detection

Running Instructions:

Follow the steps below to run the full pipeline and get the final output for HFMF.

Step 1: Fetch Weights for ViT and ResNet

Open ViTb16_finetuned.ipynb to get the fine-tuned weights for the ViT and ResNet models.
Save the weights from this notebook and use them in Module1_feature_extraction.ipynb to obtain the final weights for Module 1.

Step 2: Get Logits Using DNN_M1_WildRF.ipynb

Use DNN_M1_WildRF.ipynb to get the logits.
The notebook Module 1.ipynb is a refined version of this process, so you may use it for further improvements.

Step 3: Calibrate the Logits

Once the logits are obtained, calibrate them using the calibration.py script to adjust the outputs for the final model.

Step 4: Integrate with Module 2

Use Module2.ipynb to integrate the calibrated logits with other models, including:
- XceptionNet
- Yolov8
- Sobel filter
- Grad Cam (Explainable AI)

Step 5: Make Ensemble

Use Ensemble.ipynb to get the final ensemble of module 1 and 2. This integration will generate the final output for the HFMF task.

Here is the drive link for weights used and a small demo:

https://drive.google.com/drive/folders/1Ek7z7qaqwVf2aYMMRzi14-BWxSTeef7w?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Ensembling		Ensembling
Module 1		Module 1
Module 2		Module 2
Comparison.png		Comparison.png
README.md		README.md
calibration_module.py		calibration_module.py
gradcam_fake_outputs_CollabDiff.zip		gradcam_fake_outputs_CollabDiff.zip
gradcam_outputs_WildRF.zip		gradcam_outputs_WildRF.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection

# Accepted at WACV 2025 Workshop - AI4MFDD, Tucson, Arizona (US)

Codebase is only adapted to WildRF version.

Overview

Key Features

Architecture

Results

Data Preparation as follows:

Comparison with SOTA on WildRF

Ablation on WildRF

Installation

Step 1: Fetch Weights for ViT and ResNet

Step 2: Get Logits Using DNN_M1_WildRF.ipynb

Step 3: Calibrate the Logits

Step 4: Integrate with Module 2

Step 5: Make Ensemble

Here is the drive link for weights used and a small demo:

About

Releases

Packages

Contributors 2

Languages

taco-group/HFMF

Folders and files

Latest commit

History

Repository files navigation

HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection

# Accepted at WACV 2025 Workshop - AI4MFDD, Tucson, Arizona (US)

Codebase is only adapted to WildRF version.

Overview

Key Features

Architecture

Results

Data Preparation as follows:

Comparison with SOTA on WildRF

Ablation on WildRF

Installation

Step 1: Fetch Weights for ViT and ResNet

Step 2: Get Logits Using DNN_M1_WildRF.ipynb

Step 3: Calibrate the Logits

Step 4: Integrate with Module 2

Step 5: Make Ensemble

Here is the drive link for weights used and a small demo:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages