Our proposed model is a state-of-the-art deepfake detection architecture that integrates hierarchical feature fusion and multi-stream models. By combining Vision Transformers, ResNet, and specialized modules like Yolov8 and XceptionNet, it excels in detecting high-quality and realistic deepfakes generated by modern Generative AI techniques. The model emphasizes both spatial and semantic understanding while ensuring explainability through Grad-CAM visualizations.
- Hierarchical Feature Fusion: Combines Vision Transformer (ViT) and ResNet50 outputs for enriched representations.
- Multi-Stream Architecture: Incorporates facial features, context, and edge-aware details via Yolov8, Sobel filters, and XceptionNet.
- Explainability: Employs Grad-CAM to produce class-discriminative heatmaps for output decisions.
- Calibration: Uses Platt Scaling to refine output probabilities, reducing model overconfidence.
- Ensemble Decision Making: Aggregates multi-stream predictions for robust and accurate results.
The model architecture comprises two main modules:
-
Module One:
- Hierarchical fusion of ViT and ResNet50 for feature extraction.
- Calibration using Platt Scaling to ensure well-calibrated outputs.
-
Module Two:
- Multi-stream feature extraction using Yolov8, Sobel filters, and XceptionNet.
- Grad-CAM for visualization and explainability.
-
Calibration: Achieved an 8% decrease in Expected Calibration Error (ECE) on the training set and a 13% decrease on the validation set.
-
Explainability: Grad-CAM heatmaps demonstrate accurate localization of manipulated regions.
-
Ensemble Performance: Outperforms baseline models in detecting deepfakes while maintaining computational efficiency.
!pip install gdown
from pathlib import PosixPath
#WildRF
!gdown --id 1A0xoL44Yg68ixd-FuIJn2VC4vdZ6M2gn -c
!unzip -q -n WildRF.zip
#CollabDiff
gdown --id 1GpGvkxQ7leXqCnfnEAsgY_DXFnJwIbO4 -c
!unzip -q -n CollabDiff.zip
- Clone the repository:
git clone https://github.com/taco-group/HFMF.git cd Hierarchical-Multi-Stream-Fusion-for-Deepfake-Detection
- Running Instructions:
Follow the steps below to run the full pipeline and get the final output for HFMF.
- Open
ViTb16_finetuned.ipynb
to get the fine-tuned weights for the ViT and ResNet models. - Save the weights from this notebook and use them in
Module1_feature_extraction.ipynb
to obtain the final weights for Module 1.
- Use
DNN_M1_WildRF.ipynb
to get the logits. - The notebook
Module 1.ipynb
is a refined version of this process, so you may use it for further improvements.
- Once the logits are obtained, calibrate them using the
calibration.py
script to adjust the outputs for the final model.
- Use
Module2.ipynb
to integrate the calibrated logits with other models, including:- XceptionNet
- Yolov8
- Sobel filter
- Grad Cam (Explainable AI)
- Use
Ensemble.ipynb
to get the final ensemble of module 1 and 2. This integration will generate the final output for the HFMF task.
https://drive.google.com/drive/folders/1Ek7z7qaqwVf2aYMMRzi14-BWxSTeef7w?usp=sharing