Skip to content

Commit

Permalink
More on history and prior work.
Browse files Browse the repository at this point in the history
  • Loading branch information
stevana committed Sep 24, 2024
1 parent 75495e2 commit ec5430c
Show file tree
Hide file tree
Showing 3 changed files with 138 additions and 36 deletions.
82 changes: 66 additions & 16 deletions README-unprocessed.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,23 +16,42 @@ language-specific instrumentation of the software under test.

Fuzzing has an interesting origin. It was a class
[project](http://pages.cs.wisc.edu/~bart/fuzz/CS736-Projects-f1988.pdf) in an
advanced OS course taught by Barton Miller at the University of Wisconsin in
1988.
advanced operating systems course taught by Barton Miller at the University of
Wisconsin in 1988.

The project was inspired by the observation that back then, if you logged into
your workstation via a dail-up modem from home and it rained, then frequently
random characters would appear in the terminal. The line noise wasn't the
surprising thing, but rather that the extra characters would sometimes crash
the program that they tried to invoke.

Among these programs were basic utilities such as vi, mail, cc, make, sed, awk,
sort, etc, and it was reasonable to expect that these would give an error
message rather than crash and core dump if fed with some extra characters
caused by the rain.

So the project set out to basically recreate what the rain did, but more
effectively, but essentially generating random noise (stream of bytes) and
feeding that to different utilities and see if they crashed.

A couple of years later Barton et al published [*An empirical study of the
reliability of UNIX utilities*](https://dl.acm.org/doi/10.1145/96267.96279)
(1990).

The way Barton's fuzzer worked was just to generate random bytes and feed it to
command line tools and see if they crashed.
Inserting random characters was effective in finding corner cases where the
programmers forgot to properly validate the input from the user.

* AFL (2013), https://lcamtuf.coredump.cx/afl/historical_notes.txt

* [libfuzzer](https://llvm.org/docs/LibFuzzer.html) and it's successor
[FuzzTest](https://github.com/google/fuzztest) ("It is a first-of-its-kind
tool that bridges the gap between fuzzing and property-based testing")
However it wouldn't trigger bugs hiding deeper under the surface.

* [honggfuzz](https://github.com/google/honggfuzz)
This changed around 2007 when people [started
thinking](https://lcamtuf.coredump.cx/afl/historical_notes.txt) about how
fuzzing can be combined with [evolutionary
algorithms](https://en.wikipedia.org/wiki/Evolutionary_algorithm).

the idea
being that

* AFL (2013),

Coverage-guided fuzzers, such as [American Fuzzy
Lop](https://lcamtuf.coredump.cx/afl/) (AFL), have been very successful in
Expand All @@ -41,22 +60,53 @@ bytes as input. That means any kind of programs that takes user strings,
command line arguments or files as inputs, parsers, but also arrays of ints,
etc.


* https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html

* AFL is the tool that Dan Luu explicitly mentions, so let's stop here and go
back to his point, before looking at else has happened since

* PBT

* Go-fuzz?
* I've written about the
[history](https://stevana.github.io/the_sad_state_of_property-based_testing_libraries.html#the-history-of-property-based-testing)
of property-based testing and explained how it
[works](https://stevana.github.io/the_sad_state_of_property-based_testing_libraries.html#pure-property-based-testing-recap)
already, so I won't take up space by repeating myself here.

* Hypothesis
- Has notion of coverage: https://hypothesis.readthedocs.io/en/latest/details.html#hypothesis.event)
- But coverage-guided testing was [removed](https://github.com/HypothesisWorks/hypothesis/pull/1564/commits/dcbea9148be3446392bc3af8892d49f3cc74fbe3)

* [Crowbar](https://github.com/stedolan/crowbar)
* The idea of combining coverage-guidance and PBT

* Now let's have a look at what has happend since Dan wrote his post.

* First off, at some point he added an update to his post where he explicitly mentiones:

+ Go-fuzz?

+ Hypothesis
- Has notion of coverage: https://hypothesis.readthedocs.io/en/latest/details.html#hypothesis.event)
- But coverage-guided testing was [removed](https://github.com/HypothesisWorks/hypothesis/pull/1564/commits/dcbea9148be3446392bc3af8892d49f3cc74fbe3)

* When you search for coverage guided property-based testing

* [FuzzChick](https://dl.acm.org/doi/10.1145/3360607)? Not released, lives in
an [unmaintained
branch](https://github.com/QuickChick/QuickChick/compare/master...FuzzChick)
that [doesn't compile](https://github.com/QuickChick/QuickChick/issues/277)?
- coverage info is [same as in AFL](https://youtu.be/RR6c_fiMfJQ?t=2226)

* [libfuzzer](https://llvm.org/docs/LibFuzzer.html) and it's successor
[FuzzTest](https://github.com/google/fuzztest) ("It is a first-of-its-kind
tool that bridges the gap between fuzzing and property-based testing") (2022?)

* [honggfuzz](https://github.com/google/honggfuzz)

* [Crowbar](https://github.com/stedolan/crowbar)

* [MUTAGEN: Reliable Coverage-Guided, Property-Based Testing using Exhaustive
Mutations](https://www.mista.me/assets/pdf/icst23-preprint.pdf) (2023)


* Shae "shapr" Erisson's post [*Run property tests until coverage stops
increasing*](https://shapr.github.io/posts/2023-07-30-goldilocks-property-tests.html)
(2023) and [trynocular](https://github.com/shapr/trynocular) library.
Expand Down Expand Up @@ -205,6 +255,7 @@ The full source code is available
use `frequency`?

* Type-generic mutation?
* sometimes_each?

## See also

Expand All @@ -221,7 +272,6 @@ The full source code is available
* [AFL "whitepaper"](https://lcamtuf.coredump.cx/afl/technical_details.txt)
* [AFL mutation
heuristics](https://lcamtuf.blogspot.com/2014/08/binary-fuzzing-strategies-what-works.html)
* https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html



Expand Down
85 changes: 67 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,24 +17,41 @@ language-specific instrumentation of the software under test.

Fuzzing has an interesting origin. It was a class
[project](http://pages.cs.wisc.edu/~bart/fuzz/CS736-Projects-f1988.pdf)
in an advanced OS course taught by Barton Miller at the University of
Wisconsin in 1988.
in an advanced operating systems course taught by Barton Miller at the
University of Wisconsin in 1988.

The project was inspired by the observation that back then, if you
logged into your workstation via a dail-up modem from home and it
rained, then frequently random characters would appear in the terminal.
The line noise wasn't the surprising thing, but rather that the extra
characters would sometimes crash the program that they tried to invoke.

Among these programs were basic utilities such as vi, mail, cc, make,
sed, awk, sort, etc, and it was reasonable to expect that these would
give an error message rather than crash and core dump if fed with some
extra characters caused by the rain.

So the project set out to basically recreate what the rain did, but more
effectively, but essentially generating random noise (stream of bytes)
and feeding that to different utilities and see if they crashed.

A couple of years later Barton et al published [*An empirical study of
the reliability of UNIX
utilities*](https://dl.acm.org/doi/10.1145/96267.96279) (1990).

The way Barton's fuzzer worked was just to generate random bytes and
feed it to command line tools and see if they crashed.
Inserting random characters was effective in finding corner cases where
the programmers forgot to properly validate the input from the user.

- AFL (2013), <https://lcamtuf.coredump.cx/afl/historical_notes.txt>
However it wouldn't trigger bugs hiding deeper under the surface.

- [libfuzzer](https://llvm.org/docs/LibFuzzer.html) and it's successor
[FuzzTest](https://github.com/google/fuzztest) ("It is a
first-of-its-kind tool that bridges the gap between fuzzing and
property-based testing")
This changed around 2007 when people [started
thinking](https://lcamtuf.coredump.cx/afl/historical_notes.txt) about
how fuzzing can be combined with [evolutionary
algorithms](https://en.wikipedia.org/wiki/Evolutionary_algorithm).

- [honggfuzz](https://github.com/google/honggfuzz)
the idea being that

- AFL (2013),

Coverage-guided fuzzers, such as [American Fuzzy
Lop](https://lcamtuf.coredump.cx/afl/) (AFL), have been very successful
Expand All @@ -43,18 +60,36 @@ that take bytes as input. That means any kind of programs that takes
user strings, command line arguments or files as inputs, parsers, but
also arrays of ints, etc.

- <https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html>

- AFL is the tool that Dan Luu explicitly mentions, so let's stop here
and go back to his point, before looking at else has happened since

- PBT

- Go-fuzz?
- I've written about the
[history](https://stevana.github.io/the_sad_state_of_property-based_testing_libraries.html#the-history-of-property-based-testing)
of property-based testing and explained how it
[works](https://stevana.github.io/the_sad_state_of_property-based_testing_libraries.html#pure-property-based-testing-recap)
already, so I won't take up space by repeating myself here.

- Hypothesis
- The idea of combining coverage-guidance and PBT

- Has notion of coverage:
<https://hypothesis.readthedocs.io/en/latest/details.html#hypothesis.event>)
- But coverage-guided testing was
[removed](https://github.com/HypothesisWorks/hypothesis/pull/1564/commits/dcbea9148be3446392bc3af8892d49f3cc74fbe3)
- Now let's have a look at what has happend since Dan wrote his post.

- [Crowbar](https://github.com/stedolan/crowbar)
- First off, at some point he added an update to his post where he
explicitly mentiones:

- Go-fuzz?

- Hypothesis

- Has notion of coverage:
<https://hypothesis.readthedocs.io/en/latest/details.html#hypothesis.event>)
- But coverage-guided testing was
[removed](https://github.com/HypothesisWorks/hypothesis/pull/1564/commits/dcbea9148be3446392bc3af8892d49f3cc74fbe3)

- When you search for coverage guided property-based testing

- [FuzzChick](https://dl.acm.org/doi/10.1145/3360607)? Not released,
lives in an [unmaintained
Expand All @@ -65,6 +100,19 @@ also arrays of ints, etc.
- coverage info is [same as in
AFL](https://youtu.be/RR6c_fiMfJQ?t=2226)

- [libfuzzer](https://llvm.org/docs/LibFuzzer.html) and it's successor
[FuzzTest](https://github.com/google/fuzztest) ("It is a
first-of-its-kind tool that bridges the gap between fuzzing and
property-based testing") (2022?)

- [honggfuzz](https://github.com/google/honggfuzz)

- [Crowbar](https://github.com/stedolan/crowbar)

- [MUTAGEN: Reliable Coverage-Guided, Property-Based Testing using
Exhaustive
Mutations](https://www.mista.me/assets/pdf/icst23-preprint.pdf) (2023)

- Shae "shapr" Erisson's post [*Run property tests until coverage stops
increasing*](https://shapr.github.io/posts/2023-07-30-goldilocks-property-tests.html) (2023)
and [trynocular](https://github.com/shapr/trynocular) library.
Expand Down Expand Up @@ -316,6 +364,8 @@ The full source code is available

- Type-generic mutation?

- sometimes_each?

## See also

- <https://aflplus.plus/docs/power_schedules/>
Expand All @@ -332,7 +382,6 @@ The full source code is available
"whitepaper"](https://lcamtuf.coredump.cx/afl/technical_details.txt)
- [AFL mutation
heuristics](https://lcamtuf.blogspot.com/2014/08/binary-fuzzing-strategies-what-works.html)
- <https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html>

[^1]: Here's Dan's example in full:

Expand Down
7 changes: 5 additions & 2 deletions src/QuickCheckV1.hs
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,9 @@ infix 1 `classify`
--------------------------------------------------------------------
-- Generator

newtype Gen a
= Gen (Int -> StdGen -> a)
-- start snippet Gen
newtype Gen a = Gen (Int -> StdGen -> a)
-- end snippet

sized :: (Int -> Gen a) -> Gen a
sized fgen = Gen (\n r -> let Gen m = fgen n in m n r)
Expand All @@ -80,10 +81,12 @@ variant v (Gen m) = Gen (\n r -> m n (rands r !! (v+1)))
where
rands r0 = r1 : rands r2 where (r1, r2) = split r0

-- start snippet Gen
generate :: Int -> StdGen -> Gen a -> a
generate n rnd (Gen m) = m size rnd'
where
(size, rnd') = randomR (0, n) rnd
-- end snippet

instance Functor Gen where
fmap f m = m >>= return . f
Expand Down

0 comments on commit ec5430c

Please sign in to comment.