Skip to content

Training and testing pipeline for ransomware classification based on screenshots of the splash screens or ransom notes (https://arxiv.org/pdf/1908.06750.pdf).

License

Notifications You must be signed in to change notification settings

atapour/ransomware-classification

Repository files navigation

A King’s Ransom for Encryption: Ransomware Classification using AugmentedOne-Shot Learning and Bayesian Approximation

Requires an NVIDIA GPU, Python 3, CUDA CuDNN, PyTorch 1.2, and OpenCV.
Other libraries such as visdom are also optionally used in the code. If you wish to use visdom to plot curves and display results, this needs to be set in the arguments via --display.

General Pipeline                                                                                     Custom Network Architecture

Method:

"Newly emerging variants of ransomware pose an ever-growing threat to computer systems governing every aspect of modern life through the handling and analysis of big data. While various recent security-based approaches have focused on detecting and classifying ransomware at the network or system level, easy-to-use post-infection ransomware classification for the lay user has not been attempted before. In this paper, we investigate the possibility of classifying the ransomware a system is infected with simply based on a screenshot of the splash screen or the ransom note captured using a consumer camera commonly found in any modern mobile device. To train and evaluate our system, we create a sample dataset of the splash screens of 50 well-known ransomware variants. In our dataset, only a single training image is available per ransomware. Instead of creating a large training dataset of ransomware screenshots, we simulate screenshot capture conditions via carefully designed data augmentation techniques, enabling simple and efficient one-shot learning. Moreover, using model uncertainty obtained via Bayesian approximation, we ensure special input cases such as unrelated non-ransomware images and previously-unseen ransomware variants are correctly identified for special handling and not mis-classified. Extensive experimental evaluation demonstrates the efficacy of our work, with accuracy levels of up to 93.6% for ransomware classification."

[Atapour-Abarghouei, Bonner and McGough, 2019]



Instructions to train the model:

  • First and foremost, this repository needs to be cloned:
$ git clone https://github.com/atapour/ransomware-classification.git
$ cd ransomware-classification
  • The second step would be to download the dataset used for the training and evaluation of the model.
  • The script entitled "download_data.sh" will download the data and automatically checks the downloaded file integrity using MD5 checksum. In order to download the dataset, run the following commands:
$ chmod +x ./download_data.sh
$ ./download_data.sh

                                                                          Example of the data used to train and evaluate the approach

  • The training code can utilizes visdom to display training results and plots, in order to do which simply run visdom and then navigate to the URL http://localhost:8097. If you intend to use this, the argument --display needs to be added to the command line.

  • To train the model, run the following command:

$ python train.py <experiment_name> --data_root=./dataset/train --aug rotate contrast brightness occlusion regularblur defocusblur motionblur perspective gray colorjitter noise --input_size=256 --arch=inception
  • All the arguments for the training are passed from the file train_arguments.py. Refer to that file for further information.

Instructions to test the model:

  • In order to easily test the model, we provide the pre-trained network weights in pretrained_weights/densenet201.pth, used to produce high accuracy classification results on the test set.

  • To test the approach based on a densenet201 architecture, pre-trained on ImageNet and the full augmentation protocol, run the following command:

$ python test.py --pos_root=./dataset/test --test_checkpoint_path=./pretrained_weights/densenet201.pth --input_size=256 --pretrained --arch=densenet201


  • To test the uncertainty estimation capabilities of the approach employing Monte Carlo drop-out, use the script in uncertainty.py.
$ python uncertainty.py --pos_root=./dataset/test --neg_root=./dataset/negative --test_checkpoint_path=path/to/checkpoint --input_size=128 --arch=AmirNet_DO


This work is created as part of the project published in the following:

Reference:

A King's Ransom for Encryption: Ransomware Classification using AugmentedOne-Shot Learning and Bayesian Approximation (A. Atapour-Abarghouei, S. Bonner and A.S. McGough), in the Proceedings of the IEEE Int. Conf. Big Data, 2019. [pdf]


@article{atapour2019kings,
  title={A Kings Ransom for Encryption: Ransomware Classification using Augmented One-Shot Learning and Bayesian Approximation},
  author={Atapour-Abarghouei, Amir and Bonner, Stephen and McGough, Andrew Stephen},
  journal={arXiv preprint arXiv:1908.06750},
  year={2019}
}