-
Notifications
You must be signed in to change notification settings - Fork 0
/
introduction.tex
42 lines (38 loc) · 3.55 KB
/
introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
\section{Introduction}
\label{section:introduction}
Modern softwares are as complex as ever. As they rapidly grow in quantity, flaws in such systems are becoming more prevalent.
Since a single defect can compromise the entire system, detecting these flaws is an important task.
While most of these system flaws are just minor bugs, some can be controlled by malicious attackers and permit unauthorized actions.
Thus, it is important to find such critical flaws, which are called \textit{vulnerabilities}.
In linux kernel, hundreds of such vulnerabilities are found every year \cite{cvelinux}.
Due to the increasing number of such vulnerabilities followed by the increase of system flaws, we need an automated way of detecting software vulnerabilities.
Various methods of detecting vulnerabilities have been introduced such as static and dynamic analysis, and other methods such as metric-based detection.
Static analysis includes symbolic execution which provides exact attack vector with high accuracy.
However, execution time of symbolic execution dramatically increases when the target software is complex, making it impractical to be applied to modern softwares.
Dynamic analysis such as fuzzing can find vulnerabilities in reasonable time.
Nevertheless, some softwares are difficult to be analysed dynamically, and dynamic analysis cannot be fully automated since they require manual specifications.
In order to resolve such limitations of existing methods, I propose vulnerability detection model combining comprehensive representation and machine learning.
I use code property graph, a graph data structure presented to model vulnerability in software code, as input representation to my learning model.
This intermediate representation shows control flow or data dependency more explicitly than raw text of source code.
I also introduce various machine learning models to process such representation.
I present the following contributions in this work:
\begin{itemize}
\item
I suggest how the software code should be transformed to train vulnerability detection model.
I use code property graph as such representation, and show that code property graph provides necessary information to detect vulnerabilities in software with high granularity and accuracy.
\item
I investigated various neural network models to process the graph data.
I implemented the model which extends convolutional neural network to process arbitrary graphs.
\item
I generated dataset labeled with the type of vulnerability from open source repositories.
Compared to existing datasets used in classical software defect prediction, my dataset is labeled in function-level,
thus enabling learning model to detect vulnerability with higher granularity.
\end{itemize}
The rest of the paper is organized as follows:
Section~\ref{section:background} describes background knowledge to understand vulnerability detection and neural network.
Section~\ref{section:problem} describes my problem domain and requirements to my objectives.
Section~\ref{section:previous} introduces previous works on vulnerability detection and their limitations I focus on.
Section~\ref{section:approach} suggests my approach on designing new detection model including code property graph and neural network for graph data.
Section~\ref{section:dataset} introduces my methods to generate an appropriate software code dataset for my learning model.
Section~\ref{section:implementation} describes detailed implementation of my model followed by related works in Section~\ref{section:related}.
Finally, I conclude my work with some future works in Section~\ref{section:conclusion}.