Skip to content

Commit

Permalink
Add first draft of conclusion.
Browse files Browse the repository at this point in the history
  • Loading branch information
stevana committed Oct 3, 2024
1 parent 1876280 commit 9d44314
Show file tree
Hide file tree
Showing 5 changed files with 113 additions and 82 deletions.
68 changes: 34 additions & 34 deletions README-unprocessed.md
Original file line number Diff line number Diff line change
Expand Up @@ -683,48 +683,48 @@ The full source code is available

## Conclusion and further work

XXX:

* Exponential -> polynomial

* Makes more sense for stateful systems than pure functions? Or atleast
properties that expect a sequence of inputs?

* Don't rerun all commands for every newly generate command
+ only reset the system when shrinking

* Problem of strategy (pick something as basis for progress): coverage, logs,
value of memory, helps bootstap the process. Generalise to support more?
We've seen how to add converage-guidance to the first version of the first
property-based testing tool, QuickCheck, in about 35 lines of code.

* Local maxima?
Coverage-guidance effectively reduced a exponential problem into a polynomial
one, by building on previous test runs' successes in increasing the coverage.

* Problem of tactics: picking a good input distributed for the testing problem
at hand. Make previous input influence the next input? Dependent events, e.g.
if one packet gets lost, there's a higher chance that the next packet will be
lost as well.
The solution does change the QuickCheck API slightly by requring a property on
a list of `a`, rather than merely `a`, so it's not suitable for all properties.

* Save `(Coverage, Mutation, Frequency, Coverage)` stats?
I think this limitation isn't so important, because going further I'd like to
apply coverage-guidance to testing stateful systems. When testing stateful
systems, which I've written about
[here](https://stevana.github.io/the_sad_state_of_property-based_testing_libraries.html),
one always generates a list of commands anyway, so the limitation doesn't matter.

* More realistic example, e.g.: leader election, transaction rollback,
failover?
* Annoying to sprinkle sometimes assertions everywhere?
- Can it be combined with logging or tracing?
A more serious limitation with the current approach is that it's too greedy and
will seek to maximise coverage, without ever backtracking. This means that it
can easily get stuck in local maxima. Consider the example:

* Use size parameter to implement AFL heuristic for choosing integers? Or just
use `frequency`?
```
if input[0] == 'b'
if input[1] == 'a'
if input[2] == 'd'
skip
if input[0] == 'w'
if input[1] == 'o'
if input[2] == 'r'
if input[3] == 's'
if input[4] == 'e'
error
```

* Type-generic mutation?
* sometimes_each?
* https://en.wikipedia.org/wiki/L%C3%A9vy_flight (optimises search)
If we generate an input that starts with 'b' (rather than 'w'), then we'll get
stuck never finding the error.

## See also
Real coverage-guided tools, like AFL, will not get stuck like that. While I
have a variant of the code that can cope with this, I chose to present the
above greedy version because it's simpler.

* https://carstein.github.io/fuzzing/2020/04/18/writing-simple-fuzzer-1.html
* https://carstein.github.io/fuzzing/2020/04/25/writing-simple-fuzzer-2.html
* https://carstein.github.io/fuzzing/2020/05/02/writing-simple-fuzzer-3.html
* https://carstein.github.io/fuzzing/2020/05/21/writing-simple-fuzzer-4.html
* [How Antithesis finds bugs (with help from the Super Mario
Bros)](https://antithesis.com/blog/sdtalk/)
I might write another post with a more AFL-like solution at some later point,
but I'd also like to encourge others to port these ideas to your favorite
language and experiment!


[^1]: This example is due to Dmitry Vyukov, the main author of
Expand Down
91 changes: 44 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -896,54 +896,51 @@ The full source code is available

## Conclusion and further work

XXX:

- Exponential -\> polynomial

- Makes more sense for stateful systems than pure functions? Or atleast
properties that expect a sequence of inputs?

- Don't rerun all commands for every newly generate command

- only reset the system when shrinking

- Problem of strategy (pick something as basis for progress): coverage,
logs, value of memory, helps bootstap the process. Generalise to
support more?

- Local maxima?

- Problem of tactics: picking a good input distributed for the testing
problem at hand. Make previous input influence the next input?
Dependent events, e.g. if one packet gets lost, there's a higher
chance that the next packet will be lost as well.

- Save `(Coverage, Mutation, Frequency, Coverage)` stats?

- More realistic example, e.g.: leader election, transaction rollback,
failover?

- Annoying to sprinkle sometimes assertions everywhere?

- Can it be combined with logging or tracing?

- Use size parameter to implement AFL heuristic for choosing integers?
Or just use `frequency`?

- Type-generic mutation?

- sometimes_each?

- <https://en.wikipedia.org/wiki/L%C3%A9vy_flight> (optimises search)

## See also
We've seen how to add converage-guidance to the first version of the
first property-based testing tool, QuickCheck, in about 35 lines of
code.

- <https://carstein.github.io/fuzzing/2020/04/18/writing-simple-fuzzer-1.html>
- <https://carstein.github.io/fuzzing/2020/04/25/writing-simple-fuzzer-2.html>
- <https://carstein.github.io/fuzzing/2020/05/02/writing-simple-fuzzer-3.html>
- <https://carstein.github.io/fuzzing/2020/05/21/writing-simple-fuzzer-4.html>
- [How Antithesis finds bugs (with help from the Super Mario
Bros)](https://antithesis.com/blog/sdtalk/)
Coverage-guidance effectively reduced a exponential problem into a
polynomial one, by building on previous test runs' successes in
increasing the coverage.

The solution does change the QuickCheck API slightly by requring a
property on a list of `a`, rather than merely `a`, so it's not suitable
for all properties.

I think this limitation isn't so important, because going further I'd
like to apply coverage-guidance to testing stateful systems. When
testing stateful systems, which I've written about
[here](https://stevana.github.io/the_sad_state_of_property-based_testing_libraries.html),
one always generates a list of commands anyway, so the limitation
doesn't matter.

A more serious limitation with the current approach is that it's too
greedy and will seek to maximise coverage, without ever backtracking.
This means that it can easily get stuck in local maxima. Consider the
example:

if input[0] == 'b'
if input[1] == 'a'
if input[2] == 'd'
skip
if input[0] == 'w'
if input[1] == 'o'
if input[2] == 'r'
if input[3] == 's'
if input[4] == 'e'
error

If we generate an input that starts with 'b' (rather than 'w'), then
we'll get stuck never finding the error.

Real coverage-guided tools, like AFL, will not get stuck like that.
While I have a variant of the code that can cope with this, I chose to
present the above greedy version because it's simpler.

I might write another post with a more AFL-like solution at some later
point, but I'd also like to encourge others to port these ideas to your
favorite language and experiment!

[^1]: This example is due to Dmitry Vyukov, the main author of
[go-fuzz](https://github.com/dvyukov/go-fuzz), but it's basically an
Expand Down
9 changes: 9 additions & 0 deletions SEE_ALSO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# See also

* https://carstein.github.io/fuzzing/2020/04/18/writing-simple-fuzzer-1.html
* https://carstein.github.io/fuzzing/2020/04/25/writing-simple-fuzzer-2.html
* https://carstein.github.io/fuzzing/2020/05/02/writing-simple-fuzzer-3.html
* https://carstein.github.io/fuzzing/2020/05/21/writing-simple-fuzzer-4.html

* [How Antithesis finds bugs (with help from the Super Mario
Bros)](https://antithesis.com/blog/sdtalk/)
26 changes: 26 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Todo

* Don't rerun all commands for every newly generate command
+ only reset the system when shrinking

* Problem of strategy (pick something as basis for progress): coverage, logs,
value of shared memory, helps bootstap the process. Generalise to support more?

* Problem of tactics: picking a good input distributed for the testing problem
at hand. Make previous input influence the next input? Dependent events, e.g.
if one packet gets lost, there's a higher chance that the next packet will be
lost as well.

* More realistic example, e.g.: leader election, transaction rollback,
failover?
* Annoying to sprinkle sometimes assertions everywhere?
- Can it be combined with logging or tracing?

* Use size parameter to implement AFL heuristic for choosing integers? Or just
use `frequency`?

* Type-generic mutation?
* sometimes_each?
* https://en.wikipedia.org/wiki/L%C3%A9vy_flight (optimises search)


1 change: 0 additions & 1 deletion src/Mutator.hs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ type Mutate a = StdGen -> a -> a

mutateChar :: Mutate Char
mutateChar prng ch = generate 0 prng genChar

Check warning on line 16 in src/Mutator.hs

View workflow job for this annotation

GitHub Actions / GHC 9.10.1 on ubuntu-22.04

Defined but not used: ‘ch’
-- if ch == 'A' then 'Z' else pred ch

mutateInt16 :: Mutate Int16
mutateInt16 prng _i = fst (random prng) -- XXX: this doesn't actually mutate...
Expand Down

0 comments on commit 9d44314

Please sign in to comment.