File: 2017-07-16-types-and-properties.md

package info (click to toggle)
python-hypothesis 6.138.0-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 15,272 kB
  • sloc: python: 62,853; ruby: 1,107; sh: 253; makefile: 41; javascript: 6
file content (186 lines) | stat: -rw-r--r-- 9,283 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
tags: technical, details, python, alternatives
date: 2017-07-16 10:00
title: Moving Beyond Types
author: drmaciver
---

If you look at the original property-based testing library, the Haskell version of QuickCheck,
tests are very closely tied to types: The way you typically specify a property is by inferring
the data that needs to be generated from the types the test function expects for its arguments.

This is a bad idea.

<!--more-->

The first part of why
[I've talked about already](../integrated-shrinking/) -
you don't want to tie the shrinking of the data to its type, because it makes testing
significantly more brittle.

But you could solve that and still have data generation be tied to type, and it would still
be a bad idea.

The reason for this is very simple: Often you want to generate something much more specific
than any value of a given type.

If you look at the [Hypothesis strategies module](https://hypothesis.readthedocs.io/en/latest/data.html)
and what you can generate, many of these look like types. But if you look closer, nearly
every single one of them has options to configure them to be more specific.

Consider something like the following:

```python
from statistics import mean

from hypothesis import given, strategies as st


@given(st.lists(st.floats(allow_nan=False, allow_infinity=False), min_size=1))
def test_mean_is_in_bounds(ls):
    assert min(ls) <= mean(ls) <= max(ls)
```

In this test we've restricted the domain we're testing on so we can focus on a property
that doesn't hold in generality:  The mean of an empty list is an
error, so that needs a special case test for it, and calculations with `NaN` or infinity
are special cases that don't always have a well-defined result and even when they do
don't necessarily satisfy this property.

This happens a lot: Frequently there are properties that only hold in some restricted
domain, and so you want more specific tests for that domain to complement your other
tests for the larger range of data.

When this happens you need tools to generate something more specific, and those requirements
don't map naturally to types.

Consider writing this code based on types instead:

```python
from statistics import mean

from hypothesis import given, strategies as st


@given(ls=...)
def test_mean_is_in_bounds(ls: list[float]):
    assert min(ls) <= mean(ls) <= max(ls)
```

(this code doesn't work at the time of this writing, but it will soon -
[the pull request to implement it](https://github.com/HypothesisWorks/hypothesis-python/pull/643)
is in fairly late-stage review).

But this doesn't do the right thing: We've dropped the conditions from the
previous test that our floats are all finite and our lists are all non-empty. So now
we have to add a precondition to make the test valid:

```python
import math
from statistics import mean

from hypothesis import assume, given, strategies as st


@given(ls=...)
def test_mean_is_in_bounds(ls: list[float]):
    assume(len(ls) > 1)
    assume(all(math.isfinite(x) for x in ls))
    assert min(ls) <= mean(ls) <= max(ls)
```

But this is now substantially longer and less readable than the original approach!

In Haskell, traditionally we would fix this with a newtype declaration which wraps the type.
We could find a newtype NonEmptyList and a newtype FiniteFloat and then say that we actually
wanted a NonEmptyList[FiniteFloat] there.

In Python we could probably do more or less the same thing, either by creating new wrapper
types or by subclassing list and float (which you shouldn't do. Subclassing builtins in Python
leads to really weird behaviour) if we wanted to save a few lines, but it's much more noisy.

But why should we bother? Especially if we're only using these in one test, we're not actually
interested in these types at all, and it just adds a whole bunch of syntactic noise when you
could just pass the data generators directly. Defining new types for the data you want to generate
is purely a workaround for a limitation of the API.

If you were working in a dependently typed language where you could already naturally express
this in the type system it *might* be OK (I don't have any direct experience of working in
type systems that strong), but I'm sceptical of being able to make it work well - you're unlikely
to be able to automatically derive data generators in the general case, because the needs of
data generation "go in the opposite direction" from types (a type is effectively a predicate which
consumes a value, where a data generator is a function that produces a value, so in order to produce
a generator for a type automatically you need to basically invert the predicate). I suspect most
approaches here  will leave you with a bunch of sharp edges, but I would be interested to see
experiments in this direction.

That being said, I don't think it's that useful to do. Where stronger types are naturally
present in the program, taking advantage of them to get better testing is certainly worth
doing, but creating a specific type to represent the exact range of valid inputs to a
specific test isn't because for tests the impact of a run time error that you could have caught at
compile time is substantially reduced (all it causes is a spurious test failure - where your
tests are long running or non-deterministic as in property-based testing this *does* matter,
but it still doesn't cause a problem in production).

But this is creating hard problems where none need exist, and beside which it's not the situation
with most of the current generation of languages and property based libraries for them.
Here, using explicit data generators is a clear improvement, and just as well-typed
(in a statically typed language each data generator has a return type after all).

You *can* use data generators directly in Haskell QuickCheck too, with an explicit
[forAll](https://hackage.haskell.org/package/QuickCheck-2.10.0.1/docs/Test-QuickCheck-Property.html#v:forAll)
but it's almost as awkward as the newtype approach, particularly if you want more than one
data generator (it's even more awkward if you want shrinking - you have to use forAllWithShrink and
explicitly pass a shrink function).

This is more or less intrinsic to the type based approach. If you want tinkering with the
data generation to be non-awkward, starting from data generators needs to become the default.

And experience suggests that when you make customising the data generation easy, people do
it. It's nice to be able to be more specific in your testing, but if you make it too high
effort people either don't do it, or do it using less effective methods like adding
preconditions to tests (assume in Hypothesis, or `==>` in QuickCheck), either of which reduces
the quality of your testing and causes more bugs to slip through as a result.

Fortunately, it already *is* the default in most of the newer implementations of
property-based testing. The only holdouts are ones that directly copied Haskell QuickCheck.

Originally this was making a virtue of a necessity - most of the implementations
which started off the trend of data generator first tests are either for dynamic languages
(e.g. Erlang, Clojure, Python) or languages with very weak type systems (e.g. C) where
type first is more or less impossible, but it's proven to be a much more usable design.

And it's perfectly compatible with static typing too. [Hedgehog](https://hackage.haskell.org/package/hedgehog)
is a relatively recent property-based testing library for Haskell that takes this approach,
and it works just as well in Haskell as it does in any language.

It's also perfectly compatible with being able to derive a data generator from a type
for the cases where you really want to. We saw a hint at that with the upcoming
Hypothesis implementation above. You could easily do the same by having something
like the following in Haskell (mimicking the type class of QuickCheck):

```haskell
class Arbitrary a where
  arbitrary :: Gen a
```

You can then simply use `arbitrary` like you would any other data generator. As far as I know
Hedgehog doesn't do this anywhere (but you can use QuickCheck's Arbitrary with
the hedgehog-quickcheck package), but in principle there's nothing stopping it.

Having this also makes it much easier to define new data generators. I'm unlikely to use the
support for `@given` much, but I'm much more excited that it will also
work with `builds`, which will allow for a fairly seamless transition between
inferring the default strategy for a type and writing a custom generator. You
will, for example, be able to do `builds(MyType)` and have every constructor
argument automatically filled in (if it's suitably annotated), but you can
also do e.g. `builds(MyType, some_field=some_generator)` to override a particular
default while leaving the others alone.

(This API is somewhere where the dynamic nature of Python helps a fair bit, but you
could almost certainly do something equivalent in Haskell with a bit more noise
or a bit more template Haskell)

So this approach doesn't have to be data generator-only, even if it's data generator first,
but if you're going to pick one the flexibility of the data generator based test specification
is hard to beat, regardless of how good your type system is.