Detecting and Classifying Android Malware using Deep Learning Techniques
Creating a large CSV file with all the features and categories
Creating multiple data files for Benign vs. a malware category
Selecting features for a dataset (In progress)
Visualizing some features in the data
Since our models haven't been performing well, I decided to complete a Sanity Check notebook, demonstrating all of the techniques we're employing here and trying to find any failures in our methods.
- One issue I found was stratifying the data using
train_test_split
from sklearn. As it turns out, this function does not stratify by default and I've fixed this in the Adware vs Benign notebook and the rest. Despite this, performance is still low.
X vs Benign
Below are the experiments we want to run for the paper. Each experiment should be
The metrics we want to collect for all of these experiments are the Accuracy, Loss, True Positive Rate (TPR), True Negative Rate (TNR), False Positive Rate (FPR), and False Negative Rate (FNR). Depending on the platform the experiments are ran (fastai, Keras), there will be different ways of acquiring the data. Notes of how to do so will be detailed below.
# Initializing the metrics objects
accuracy = BinaryAccuracy()
tp = TruePositives()
tn = TrueNegatives()
fp = FalsePositives()
fn = FalseNegatives()
metrics = [accuracy, tp, tn, tp, fn]
# Adding to the model's compile method
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=metrics)
# Set up the metrics we want to collect. I wanted TP,TN,FP,FN but that wasn't available. Recall and precision
# are still extremely helpful for evaluating the model
metrics = [accuracy, Recall(), Precision()]
# Create the learner
### To-Do List
- [x] Create a line graph demonstrating how **loss** changes as we change the learning rate and a scatter plot demonstrating the differences in performance (**accuracy** or **loss**) for each different optimizer.
ScreenShot is in /ScreeShots/sgd.png
- Loss was improved by using mean_squared_error loss function. Using other optimizers like Adam, Optimizers, Adadelta; the accurace was the same i.e. ~53%.
- Various learning rate like 0.1, 0.01 were applied, however doing some research, 0.001 was found to be the optimal one.
- Using SGD optimizers and mean_squared_loss function on keras-tensorflow, and keras-theano produced the similar result (Accuracy: ~53% and loss function value: ~24).
- Binary Classification Problem: Is a sample malicious or benign traffic? (Completed before the previous To-Do, so I may have to rerun these with a new optimizer or learning rate)
- fastai
- Keras-TensorFlow
- Keras-Theano
- Multi-Classification Problem #1: Can we differentiate between benign, adware, scareware, etc... traffic?
- fastai
- Keras-TensorFlow
- Keras-Theano
- Multi-Classification Problem #2: Can we differentiate between the different species of each type of malicious traffic? (Ex. Gooligan vs ... vs Shuanet for Adware)
- Adware
- fastai
- Keras-TensorFlow
- Keras-Theano
- Ransomware
- fastai
- Keras-TensorFlow
- Keras-Theano
- Scareware
- fastai
- Keras-TensorFlow
- Keras-Theano
- SMSmalware
- fastai
- Keras-TensorFlow
- Keras-Theano
- Adware
Place acquired graphs and data here (or point us to the file with the data)
While detecting android malware with Deep Learning and other machine learning techniques seems to be a solved academic problem (Z. Yuan et. al, X. Su et. al, Abdelmonim Naway and Yuancheng LI, Karbab et. al), employing both static and dynamic analysis on the malware, there is little published work using machine learning techniques on network traffic to specifically detect android malware. The CICMalAnal2017 dataset is one of the only datasets containing real, up-to-date network traffic from malicious and benign android applications. The goal of this project is to employ deep learning techniques, in conjunction with the CICMalAnal2017 dataset, to accurately identify the intent of a given application through collected network traffic data.
The dataset used for this project is described by the Canadian Institute for Cybersecurity at the University of New Brunswick here. The link at the bottom of the description of their site can be used to download the dataset. Additionally, the provided dl-data.sh
script may be used (however the link used needs occasional updating. The script works as of May 2020).
Since this is a significant dataset (roughly 300 MB zipped), the download takes a while. Go enjoy a coffee while you wait.
As described in Arash et. al, only nine attributes of the provided 80+ are used to achieve high-accuracy in simpler machine learning algorithms. For computational and temporal simplicity, only these nine attributes are kept for the analysis conducted here. Below are listed the nine attributes from the paper matched to the attribute name in the dataset:
- Maximum flow packet length (Flow IAT Max)
- Minimum flow packet length (Flow IAT Min)
- Backward variance data bytes (Bwd Packet Length Std)*
- Flow FIN F 17 (FIN Flag Count)
- Flow forward bytes (Fwd IAT Total)
- Flow backward bytes (Bwd IAT Total)
- Maximum Idle (Idle Max)
- Initial window forward (Init_Win_bytes_forward)
- Minimum segment size forward (min_seg_size_forward)
* (Can't find the variance, so using this instead since it is related)
Since the analysis is focused on determining the type of traffic (malicious/benign) given a sample, attributes such as IP and Port numbers are dropped from the dataset. There is an obvious use of these in ideas such as black/whitelists, however this is not the contribution of the project. Nan
values are also dropped if present.
The composition of the dataset is shown in the table below:
Type | Number of Instances |
---|---|
Benign | 1,210,210 |
Malware | 982,212 |
Adware | 424,147 |
Broken down further, we have a clearer idea of the makeup.
Type | Number of Instances |
---|---|
Benign | 1,210,210 |
Adware | 424,147 |
Scareware | 401,165 |
Ransomware | 348,943 |
SMSmalware | 229,275 |
Additionally, there is a listing out of both the types of malware and species of each individual malware below.
Malware Type | Species | Number of Instances |
---|---|---|
ADWARE | DOWGIN | 39,682 |
EWIND | 43,374 | |
FEIWO | 56,632 | |
GOOLIGAN | 93,772 | |
KEMOGE | 38,771 | |
KOODOUS | 32,547 | |
MOBIDASH | 31,034 | |
SELFMITE | 13,029 | |
SHUANET | 39,271 | |
YOUMI | 36,035 | |
RANSOMWARE | CHARGER | 39,551 |
JISUT | 25,672 | |
KOLER | 44,555 | |
LOCKERPIN | 25,307 | |
PLETOR | 4,715 | |
PORNDROID | 46,082 | |
RANSOMBO | 39,859 | |
SIMPLOCKER | 36,340 | |
SVPENG | 54,161 | |
WANNALOCKER | 32,701 | |
SCAREWARE | ANDROIDDEFENDER | 56,440 |
ANDROIDSPY | 25,414 | |
AVFORANDROID | 42,448 | |
AVPASS | 40,776 | |
FAKEAPP | 34,676 | |
FAKEAPPAL | 44,563 | |
FAKEAV | 40,089 | |
FAKEJOBOFFER | 30,683 | |
FAKETAOBAO | 33,299 | |
PENETHO | 21,631 | |
VIRUSSHIELD | 23,716 | |
(Unlabeled) | 7,430 | |
SMSMALWARE | BEANBOT | 12,371 |
BIIGE | 33,678 | |
FAKEINST | 15,026 | |
FAKENOTIFY | 22,197 | |
FAKEMART | 6,401 | |
JIFAKE | 5,993 | |
MAZARBOT | 6,065 | |
NANDROBOX | 44,517 | |
PLANKTON | 39,765 | |
SMSSNIFFER | 33,618 | |
ZSONE | 9,644 | |
MALWARE | Unlabeled | 2,828 |
- Compare optimizer and learning rate performance
- Binary Classification Problem: Is a sample malicious or benign traffic?
- fastai
- Keras-TensorFlow
- Keras-Theano
- Multi-Classification Problem #1: Can we differentiate between benign, adware, scareware, etc... traffic?
- fastai
- Keras-TensorFlow
- Keras-Theano
- Multi-Classification Problem #2: Can we differentiate between the different species of each type of malicious traffic? (Ex. Gooligan vs ... vs Shuanet for Adware)
- Adware
- fastai
- Keras-TensorFlow
- Keras-Theano
- Ransomware
- fastai
- Keras-TensorFlow
- Keras-Theano
- Scareware
- fastai
- Keras-TensorFlow
- Keras-Theano
- SMSmalware
- fastai
- Keras-TensorFlow
- Keras-Theano
- Adware
- perfomance results using various deep learning frameworks are compared
- https://www.fast.ai/
- uses PyTorch (https://pytorch.org/) as the backend
- https://keras.io/
- using Tensorflow and Theano as backend
- https://www.tensorflow.org/
- https://github.com/Theano/Theano
- classification of adware types
Framework | Accuracy (%) |
---|---|
Fastai-Pytorch | 42.72 |
Keras-Tensorflow | * |
Keras-Theano | * |
Framework | Accuracy (%) |
---|---|
Fastai-Pytorch | * |
Keras-Tensorflow | * |
Keras-Theano | * |
Framework | Accuracy (%) |
---|---|
Fastai-Pytorch | * |
Keras-Tensorflow | * |
Keras-Theano | * |
Framework | Accuracy (%) |
---|---|
Fastai-Pytorch | * |
Keras-Tensorflow | * |
Keras-Theano | * |
- Arash Habibi Lashkari, Andi Fitriah A. Kadir, Laya Taheri, and Ali A. Ghorbani, “Toward Developing a Systematic Approach to Generate Benchmark Android Malware Datasets and Classification”, In the proceedings of the 52nd IEEE International Carnahan Conference on Security Technology (ICCST), Montreal, Quebec, Canada, 2018.,
- Z. Yuan, Y. Lu and Y. Xue, "Droiddetector: android malware characterization and detection using deep learning," in Tsinghua Science and Technology, vol. 21, no. 1, pp. 114-123, Feb. 2016, doi: 10.1109/TST.2016.7399288.
- X. Su, D. Zhang, W. Li and K. Zhao, "A Deep Learning Approach to Android Malware Feature Learning and Detection," 2016 IEEE Trustcom/BigDataSE/ISPA, Tianjin, 2016, pp. 244-251, doi: 10.1109/TrustCom.2016.0070.
- Abdelmonim Naway and Yuancheng LI, "A Review on The Use of Deep Learning in Android Malware Detection", 2018
- Karbab, Elmouatez & Debbabi, Mourad & Derhab, Abdelouahid & Mouheb, Djedjiga. (2017). Android Malware Detection using Deep Learning on API Method Sequences.