File: 2016-12-05-integrated-shrinking.md

package info (click to toggle)
python-hypothesis 6.138.0-1
links: PTS, VCS
area: main
in suites: sid
size: 15,272 kB
sloc: python: 62,853; ruby: 1,107; sh: 253; makefile: 41; javascript: 6
file content (165 lines) | stat: -rw-r--r-- 6,189 bytes
parent folder | download | duplicates (2)
---
tags: technical, details, python, alternatives
date: 2016-12-05 10:00
title: Integrated vs type based shrinking
author: drmaciver
---

One of the big differences between Hypothesis and Haskell QuickCheck is
how shrinking is handled.

Specifically, the way shrinking is handled in Haskell QuickCheck is bad
and the way it works in Hypothesis (and also in test.check and EQC) is
good. If you're implementing a property based testing system, you should
use the good way. If you're using a property based testing system and it
doesn't use the good way, you need to know about this failure mode.

Unfortunately many (and possibly most) implementations of property based
testing are based on Haskell's QuickCheck and so make the same mistake.

<!--more-->

The big difference is whether shrinking is integrated into generation.

In Haskell's QuickCheck, shrinking is defined based on *types*: Any
value of a given type shrinks the same way, regardless of how it is
generated. In Hypothesis, test.check, etc. instead shrinking is part
of the generation, and the generator controls how the values it produces
shrinks (this works differently in Hypothesis and test.check, and probably
differently again in EQC, but the user visible result is largely the
same)

This is not a trivial distinction. Integrating shrinking into generation
has two large benefits:

* Shrinking composes nicely, and you can shrink anything you can generate
  regardless of whether there is a defined shrinker for the type produced.
* You can guarantee that shrinking satisfies the same invariants as generation.

The first is mostly important from a convenience point of view: Although
there are some things it let you do that you can't do in the type based
approach, they're mostly of secondary importance. It largely just saves
you from the effort of having to write your own shrinkers.

But the second is *really* important, because the lack of it makes your
test failures potentially extremely confusing.

To see this, lets consider the following example in Hypothesis:

```python
from hypothesis import given
from hypothesis.strategies import integers

even_numbers = integers().map(lambda x: x * 2)


@given(even_numbers)
def test_even_numbers_are_even(n):
    assert n % 2 == 0
```

This test always passes: We generate an even number by multiplying
an integer we've generated by two. No problem.

Now suppose we made the test fail:


```python
from hypothesis import given
from hypothesis.strategies import integers

even_numbers = integers().map(lambda x: x * 2)


@given(even_numbers)
def test_even_numbers_are_even(n):
    assert n % 2 == 0
    assert n <= 4
```

This test will of course fail: Any value of n which is at least 5 will
fail the test.

But which assertion will fail, and what value will cause it to fail?

In Hypothesis it's the second one, and it will fail with n=6: The numbers
passed in will still always be even integers, and the smallest even
integer which fails the test is 6.

But suppose Hypothesis implemented type based shrinking (very early
versions of it *did* implement type based shrinking, but this stopped
being the case somewhere well before 1.0 when the API looked very
different).

In that case Hypothesis would just know that these things that were
failing the tests were integers, so it would say "How about 1? 1 is a
perfectly valid integer. Lets try 1".

It would pass in n=1, the first assertion would trigger, and the test
would promptly fail. A successful shrink!

But it's completely not what we wanted. We were just trying to test on
even integers and the shrinking messed this up.

This is in some sense the classic
[shrinkers are fuzzers](http://blog.regehr.org/archives/1284) problem
where an error is reduced to a different error, but it's a particularly
annoying version of that because an error we care about is being reduced
to an error we don't care about.

So we have to duplicate
the constraint logic in our test to make this work:


```python
from hypothesis import assume, given
from hypothesis.strategies import integers

even_numbers = integers().map(lambda x: x * 2)


@given(even_numbers)
def test_even_numbers_are_even(n):
    assume(n % 2 == 0)
    assert n % 2 == 0
    assert n <= 4
```

(Having both the assume and the first assert there is of course
redundant)

In this example the problem was relatively obvious and so easy to
work around, but as your invariants get more implicit and subtle
it becomes really problematic: In Hypothesis it's easy and
convenient to
[generate quite complex data](../generating-the-right-data/),
and trying to recreate the invariants that are automatically
satisfied with that in your tests and/or your custom shrinkers would
quickly become a nightmare.

I don't think it's an accident that the main systems to get this right are
in dynamic languages. It's certainly not *essential* - [the original proposal that
lead to the implementation for test.check was for
Haskell](https://mail.haskell.org/pipermail/libraries/2013-November/021674.html),
and [Jack](https://github.com/ambiata/disorder.hs/tree/master/disorder-jack) is
an alternative property based system for Haskell that does this - but you
feel the pain much more quickly in dynamic languages because the typical
workaround for this problem in Haskell is to define a newtype, which lets you
turn off the default shrinking for your types and possibly define your own.

But that's a workaround for a problem that shouldn't be there in the first place,
and using it will still result in your having to encode the invariants into your
your shrinkers, which is more work and more brittle than just having it work
automatically.

So although (as far as I know) none of the currently popular property based
testing systems for statically typed languages implement this behaviour
correctly, they absolutely can and they absolutely should. It will improve
users' lives significantly.

This of course goes doubly for dynamic languages, where even working around
this problem is hard.

So, in conclusion, just say no to type based shrinking. It's a bad idea,
and failures your property based tests will become significantly harder
to understand in its presence.