File: 2016-05-13-what-is-property-based-testing.md

package info (click to toggle)
python-hypothesis 6.138.0-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 15,272 kB
  • sloc: python: 62,853; ruby: 1,107; sh: 253; makefile: 41; javascript: 6
file content (132 lines) | stat: -rw-r--r-- 7,631 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
tags: non-technical, philosophy
date: 2016-05-14 09:00
title: What is Property Based Testing?
author: drmaciver
---

I get asked this a lot, and I write property based testing tools for a living, so you'd think
I have a good answer to this, but historically I haven't. This is my attempt to fix that.

Historically the definition of property based testing has been "The thing that
[QuickCheck](../quickcheck-in-every-language/) does". As
a working definition this has served pretty well, but the problem is that it makes us unable
to distinguish what the essential features of property-based testing are and what are just
[accidental features that appeared in the implementations that we're used to](../referential-transparency/).

As the author of a property based testing system which diverges quite a lot from QuickCheck,
this troubles me more than it would most people, so I thought I'd set out some of my thoughts on
what property based testing is and isn't.

This isn't intended to be definitive, and it will evolve over time as my thoughts do, but
it should provide a useful grounding for further discussion.

<!--more-->

There are essentially two ways we can draw a boundary for something like this: We can go
narrow or we can go wide. i.e. we can restrict our definitions to things that look exactly
like QuickCheck, or things that are in the same general family of behaviour. My inclination
is always to go wide, but I'm going to try to rein that in for the purpose of this piece.

But I'm still going to start by planting a flag. The following are *not* essential features
of property based testing:

1. [Referential Transparency](../referential-transparency/).
2. Types
3. Randomization
4. The use of any particular tool or library

As evidence I present the following:

1. Almost every property based testing library, including but not limited to Hypothesis and
   QuickCheck (both Erlang and Haskell).
2. The many successful property based testing systems for dynamic languages. e.g. Erlang
   QuickCheck, test.check, Hypothesis.
3. [SmallCheck](https://hackage.haskell.org/package/smallcheck). I have mixed feelings about
   its effectiveness, but it's unambiguously property-based testing.
4. It's very easy to hand-roll your own testing protocols for property-based testing of a
   particular result. For example, I've [previously done this for testing a code formatter](
   http://www.drmaciver.com/2015/03/27-bugs-in-24-hours/): Run it over a corpus (more on
   whether running over a corpus "really" counts in a second) of Python files, check whether
   the resulting formatted code satisfies PEP8. It's classic property-based testing with an
   oracle.

So that provides us with a useful starting point of things that are definitely property based
testing. But you're never going to find a good definition by looking at only positive examples,
so lets look at some cases where it's more arguable.

First off, lets revisit that parenthetical question: Does just testing against a large corpus count?

I'm going to go with "probably". I think if we're counting SmallCheck we need to count testing
against a large corpus: If you take the first 20k outputs that would be generated by SmallCheck
and just replay the test using those the first N of those each time, you're doing exactly the
same sort of testing. Similarly if you draw 20k outputs using Hypothesis and then just randomly
sample from them each time.

I think drawing from a small, fixed, corpus probably *doesn't* count. If you could feasibly
write a property based test as 10 example based tests in line in your source code, it's
probably really just example based testing. This boundary is a bit, um, fuzzy though.

On which note, what about fuzzing?

I have previously argued that fuzzing is just a form of property-based testing - you're testing
the property "it doesn't crash". I think I've reversed my opinion on this. In particular, I think
[the style of testing I advocate for getting started with Hypothesis](../getting-started-with-hypothesis/), *probably* doesn't
count as property based testing.

I'm unsure about this boundary. The main reason I'm drawing it here is that they do feel like
they have a different character - property based testing requires you to reason about how your
program should behave, while fuzzing can just be applied to arbitrary programs with minimal
understanding of their behaviour - and also that fuzzing feels somehow more fundamental.

But you can certainly do property based testing using fuzzing tools, in the same way that you
can do it with hand-rolled property based testing systems - I could have taken my Python formatting
test above, added [python-afl](http://jwilk.net/software/python-afl) to the mix, and
that would still be property based testing.

Conversely, you can do fuzzing with property-based testing tools: If fuzzing is not property
based testing then not all tests using Hypothesis, QuickCheck, etc. are property based tests.
I'm actually OK with that. There's a long tradition of testing tools being used outside their
domain - e.g. most test frameworks originally designed as unit testing tools end up getting
being used for the whole gamut of testing.

So with that in mind, lets provide a definition of fuzzing that I'd like to use:

> Fuzzing is feeding a piece of code (function, program, etc.) data from a large corpus, possibly
> dynamically generated, possibly dependent on the results of execution on previous data, in
> order to see whether it fails.

The definitions of "data" and "whether it fails" will vary from fuzzer to fuzzer - some fuzzers
will generate only binary data, some fuzzers will generate more structured data. Some fuzzers
will look for a process crash, some might just look for a function returning false.

(Often definitions of fuzzing focus on "malformed" data. I think this is misguided and fails
to consider a lot of things that people would obviously consider fuzzers. e.g. [CSmith](https://embed.cs.utah.edu/csmith/)
is certainly a type of fuzzer but deliberately only generates well formed C programs).

And given that definition, I think I can now provide a definition of property-based testing:

> Property based testing is the construction of tests such that, when these tests are fuzzed,
> failures in the test reveal problems with the system under test that could not have been
> revealed by direct fuzzing of that system.

(If you feel strongly that fuzzing *should* count as property-based testing you can just drop
the 'that could not have been etc.' part. I'm on the fence about it myself.)

These extra modes of failure then constitute the properties that we're testing.

I think this works pretty well at capturing what we're doing with property based testing. It's
not perfect, but I'm pretty happy with it. One of the things I particularly like is that it makes
it clear that property-based testing is what *you* do, not what the computer does. The part
that the computer does is "just fuzzing".

Under this point of view, a property-based testing library is really two parts:

1. A fuzzer.
2. A library of tools for making it easy to construct property-based tests using that fuzzer.

Hypothesis is very explicitly designed along these lines - the core of it is a structured
fuzzing library called Conjecture - which suggests that I may have a bit of bias here, but
I still feel that it captures the behaviour of most other property based testing systems
quite well, and provides quite a nice middle ground between the wider definition that I
wanted and the more tightly focused definition that QuickCheck orients people around.