tormalwarefp

Description: This repository contains code and datasets for the ACM CCS 2022 paper:

Title: Exposing the Rat in the Tunnel: Using Traffic Analysis for Tor-based Malware Detection

Authors: Priyanka Dodia, Mashael AlSabah, Omar Alrawi, Tao Wang

Our proposed solution is a Machine Learning based prototype designed to identify stealthy Tor-based malware C&C connections using traffic analysis on encrypted Tor traffic. The models further infer the type of malware from the Tor traffic by fingerprinting malicious behavior at the connection and host-levels.

Note: Conference presentation slides PDF

Files & Control Flow:

Main file: classify_topk.py

USAGE: python(3.7) classify_topk.py --options [options_file] --topk [topk] --train/zeroday

[options_file]: Options file defining parameter inputs for classification

[topk]: Use k=1 or k=3 for topk most active Tor connections (connections with most activity)

[train]: Set option to train models for binary/multi-label classification

[zeroday]: Set option to test trained models on provided zeroday data

Datasets provided:

train_D5: Data used for training/validation/testing ML models
zerodaytest.zip: Zero day data for testing the trained models on unseen malware Tor traffic

Note: The data consists of cell files representing connections from a PCAP (ie. Tor traffic obtained from malware/benign binary executions in the Falcon Sandbox). Connection-level features use Tor cell direction, time, order information and Host-level features use information from all Tor connections in a PCAP (appended to the end of each cell file).

Option files provided:

options-D5
options-D5_host
options-zeroday_binary
options-zeroday_multilabel

1. Binary Classification: Classify Tor-based malware and benign connections

Scenarios:

Note(!): 'MULTICLASS' option must be set to 0 in options file

Train models with CONNECTION-LEVEL features only [Hayes et al. 2016] derived from top3 highly active Tor connections
```
cmd: python classify_topk.py --options options-D5 --topk 3 --train
```
Train models with CONNECTION+HOST-LEVEL features [Dodia et al. 2022] using top3 highly active Tor connections for connection-level features
```
cmd: python classify_topk.py --options options-D5_host --topk 3 --train
```

2. Multi-label Classification: Infer malware class type

Note(!): 'MULTICLASS' option must be set to 1 in options file

Same commands as used in binary classification.

3. Zeroday Scenario: Test models using traffic from new, unseen binaries (EternalRocks malware)

Identify zeroday malware connections using pre-trained binary classifier model

cmd: python classify_topk.py --options options-zeroday_binary --topk 3 --zeroday

Identify type of malware (class labels) using pre-trained multi label classifier models
```
cmd: python classify_topk.py --options options-zeroday_multilabel --topk 3 --zeroday
```

Note:

All experiments can be run with topk=1 or topk=3 (optimal results achieved when top3 most active Tor connections are used for training & testing).
Host features can be activated/deactivated by setting HOSTFTS to True/False or commenting in/out in the options file.
Models trained with HOSTFTS, must be tested with HOSTFTS option activated in the test (ie. in the zeroday option files).

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
avclass		avclass
FinalCCS_Slides2022.pdf		FinalCCS_Slides2022.pdf
LICENSE		LICENSE
ML_classify.py		ML_classify.py
README.md		README.md
autogluon_classify.py		autogluon_classify.py
cellparser.py		cellparser.py
classify_topk.py		classify_topk.py
extract_host_fts.py		extract_host_fts.py
features_topk.py		features_topk.py
loaders_binary.py		loaders_binary.py
loaders_multilabel.py		loaders_multilabel.py
map_D5		map_D5
options-D5		options-D5
options-D5_host		options-D5_host
options-zeroday_binary		options-zeroday_binary
options-zeroday_multilabel		options-zeroday_multilabel
requirements.txt		requirements.txt
splitbinaries.txt		splitbinaries.txt
train_D5.zip		train_D5.zip
zerodaytest.zip		zerodaytest.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tormalwarefp

Files & Control Flow:

1. Binary Classification: Classify Tor-based malware and benign connections

Scenarios:

2. Multi-label Classification: Infer malware class type

3. Zeroday Scenario: Test models using traffic from new, unseen binaries (EternalRocks malware)

Note:

About

Releases

Packages

Languages

License

malfp/tormalwarefp

Folders and files

Latest commit

History

Repository files navigation

tormalwarefp

Files & Control Flow:

1. Binary Classification: Classify Tor-based malware and benign connections

Scenarios:

2. Multi-label Classification: Infer malware class type

3. Zeroday Scenario: Test models using traffic from new, unseen binaries (EternalRocks malware)

Note:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages