Skip to content

Commit

Permalink
Edit abstract and motivation.
Browse files Browse the repository at this point in the history
  • Loading branch information
stevana committed Oct 8, 2024
1 parent 9d44314 commit fb84af1
Show file tree
Hide file tree
Showing 3 changed files with 63 additions and 70 deletions.
59 changes: 27 additions & 32 deletions README-unprocessed.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,13 @@ Almost ten years ago, back in 2015, Dan Luu wrote a
testing wasn't a thing.

In this post I'll survey the coverage-guided landscape, looking at what was
there before Dan's post and what has happened since.
there before Dan's post and what has happened since. I'll also show how to add
basic coverage-guidance to the first version of the
original property-based testing tool, QuickCheck, in about 35 lines of code.

The short version is: today imperative languages seem to be in the forefront of
combining coverage-guidance and property-based testing.

In an effort to try to help functional programming languages catch up, I'll
show how coverage-guidence can be added to the first version of the original
property-based testing tool, QuickCheck, in about 35 lines of code.

The technique is programming language agnostic and doesn't rely on any
language-specific instrumentation of the software under test (unlike previous
implementations of this idea).
Unlike many previous implementations of this idea, the technique used to
implement coverage-guidance is programming language agnostic and doesn't rely
on any language-specific instrumentation of the software under test.

## Motivation

Expand All @@ -42,23 +37,26 @@ func sut(input []byte) {
}
```

If we were to try to test this function with property-based testing, where we
restrict the input to be of exactly length 4, then it would still take
$\mathcal{O}(2^8 \cdot 2^8 \cdot 2^8 \cdot 2^8) = \mathcal{O}((2^8)^4) =
\mathcal{O}(2^{32}) \approx 4B$ tries to trigger the bug! A more realistic test
wouldn't fix the length of the input, which would make the probability of
triggering the bug even lower.
What are the odds that a property-based testing tool (without
coverage-guidance) would be able to find the error?

To make the calculation easier, let's say that we always generate arrays of
length $4$. A byte consists of eight bits, so it has $2^8$ possible values.
That means that the probability is $\frac{1}{2^8} \cdot \frac{1}{2^8} \cdot
\frac{1}{2^8} \cdot \frac{1}{2^8}) = \frac{1}{2^8}^4 = \frac{1}{2^{32}}$ which
is approximately $1$ in $4$ billion. In a realistic test suite, we wouldn't
restrict the length of the array to be $4$, and hence the probability will be
even worse.

With coverage-guidance we keep track of inputs that resulted in increased
coverage. So, for example, if we generate the array `[]byte{'A'}` we get
coverage. So, for example, if we generate the array `[]byte{'b'}` we get
further into the nested ifs, and so we take note of that and start generating
longer arrays that start with 'A' and see if we get even further, etc.

By building on previous succeses in getting more coverage, we can effectively
reduce the problem to only need $\mathcal{O}(2^8 + 2^8 + 2^8 + 2^8) =
\mathcal{O}(2^8 \cdot 4) = \mathcal{O}(2^{10}) = 1024$ tries.
longer arrays that start with `'b'` and see if we get even further, etc. By
building on previous successes in getting more coverage, we can effectively
reduce the problem to only need $\frac{1}{2^8} + \frac{1}{2^8} + \frac{1}{2^8} +
\frac{2^8} = \frac{1}{2^8} \cdot 4 = \fraq{1}{2^{10}} = \frac{1}{1024}$.

In other words coverage-guidence turns an exponential problem into a polynomial
In other words coverage-guidance turns an exponential problem into a polynomial
problem!

## Background and prior work
Expand Down Expand Up @@ -703,19 +701,16 @@ will seek to maximise coverage, without ever backtracking. This means that it
can easily get stuck in local maxima. Consider the example:

```
if input[0] == 'o'
if input[1] == 'k'
return
if input[0] == 'b'
if input[1] == 'a'
if input[2] == 'd'
skip
if input[0] == 'w'
if input[1] == 'o'
if input[2] == 'r'
if input[3] == 's'
if input[4] == 'e'
error
error
```

If we generate an input that starts with 'b' (rather than 'w'), then we'll get
If we generate an input that starts with 'o' (rather than 'b'), then we'll get
stuck never finding the error.

Real coverage-guided tools, like AFL, will not get stuck like that. While I
Expand Down
70 changes: 32 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,14 @@ Almost ten years ago, back in 2015, Dan Luu wrote a
property-based testing wasn't a thing.

In this post I'll survey the coverage-guided landscape, looking at what
was there before Dan's post and what has happened since.
was there before Dan's post and what has happened since. I'll also show
how to add basic coverage-guidance to the first version of the original
property-based testing tool, QuickCheck, in about 35 lines of code.

The short version is: today imperative languages seem to be in the
forefront of combining coverage-guidance and property-based testing.

In an effort to try to help functional programming languages catch up,
I'll show how coverage-guidence can be added to the first version of the
original property-based testing tool, QuickCheck, in about 35 lines of
code.

The technique is programming language agnostic and doesn't rely on any
language-specific instrumentation of the software under test (unlike
previous implementations of this idea).
Unlike many previous implementations of this idea, the technique used to
implement coverage-guidance is programming language agnostic and doesn't
rely on any language-specific instrumentation of the software under
test.

## Motivation

Expand All @@ -42,26 +37,28 @@ input byte array starts with the bytes `"bad!"`:
}
}

If we were to try to test this function with property-based testing,
where we restrict the input to be of exactly length 4, then it would
still take
$\mathcal{O}(2^8 \cdot 2^8 \cdot 2^8 \cdot 2^8) = \mathcal{O}((2^8)^4) =
\mathcal{O}(2^{32}) \approx 4B$ tries to trigger the bug! A more
realistic test wouldn't fix the length of the input, which would make
the probability of triggering the bug even lower.
What are the odds that a property-based testing tool (without
coverage-guidance) would be able to find the error?

To make the calculation easier, let's say that we always generate arrays
of length $4$. A byte consists of eight bits, so it has $2^8$ possible
values. That means that the probability is
$\frac{1}{2^8} \cdot \frac{1}{2^8} \cdot
\frac{1}{2^8} \cdot \frac{1}{2^8}) = \frac{1}{2^8}^4 = \frac{1}{2^{32}}$
which is approximately $1$ in $4$ billion. In a realistic test suite, we
wouldn't restrict the length of the array to be $4$, and hence the
probability will be even worse.

With coverage-guidance we keep track of inputs that resulted in
increased coverage. So, for example, if we generate the array
`[]byte{'A'}` we get further into the nested ifs, and so we take note of
that and start generating longer arrays that start with 'A' and see if
we get even further, etc.

By building on previous succeses in getting more coverage, we can
effectively reduce the problem to only need
$\mathcal{O}(2^8 + 2^8 + 2^8 + 2^8) =
\mathcal{O}(2^8 \cdot 4) = \mathcal{O}(2^{10}) = 1024$ tries.

In other words coverage-guidence turns an exponential problem into a
`[]byte{'b'}` we get further into the nested ifs, and so we take note of
that and start generating longer arrays that start with `'b'` and see if
we get even further, etc. By building on previous successes in getting
more coverage, we can effectively reduce the problem to only need
$\frac{1}{2^8} + \frac{1}{2^8} + \frac{1}{2^8} +
\frac{2^8} = \frac{1}{2^8} \cdot 4 = \fraq{1}{2^{10}} = \frac{1}{1024}$.

In other words coverage-guidance turns an exponential problem into a
polynomial problem!

## Background and prior work
Expand Down Expand Up @@ -920,18 +917,15 @@ greedy and will seek to maximise coverage, without ever backtracking.
This means that it can easily get stuck in local maxima. Consider the
example:

if input[0] == 'o'
if input[1] == 'k'
return
if input[0] == 'b'
if input[1] == 'a'
if input[2] == 'd'
skip
if input[0] == 'w'
if input[1] == 'o'
if input[2] == 'r'
if input[3] == 's'
if input[4] == 'e'
error

If we generate an input that starts with 'b' (rather than 'w'), then
error

If we generate an input that starts with 'o' (rather than 'b'), then
we'll get stuck never finding the error.

Real coverage-guided tools, like AFL, will not get stuck like that.
Expand Down
4 changes: 4 additions & 0 deletions SEE_ALSO.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,7 @@

* [How Antithesis finds bugs (with help from the Super Mario
Bros)](https://antithesis.com/blog/sdtalk/)

* [QuickFuzz testing for fun and
profit](https://ri.conicet.gov.ar/bitstream/handle/11336/50343/CONICET_Digital_Nro.8f82685b-598a-4e24-aaa9-7330786054a5_A.pdf)
(2017)

0 comments on commit fb84af1

Please sign in to comment.