1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348
|
---
tags: python, technical, intro
date: 2016-04-19 07:00
title: Rule Based Stateful Testing
author: drmaciver
---
Hypothesis's standard testing mechanisms are very good for testing things that can be
considered direct functions of data. But supposed you have some complex stateful
system or object that you want to test. How can you do that?
In this article we'll see how to use Hypothesis's *rule based state machines* to define
tests that generate not just simple data, but entire programs using some stateful
object. These will give the same level of boost to testing the behaviour of the
object as you get to testing the data it accepts.
<!--more-->
The model of a stateful system we'll be using is a [priority queue](https://en.wikipedia.org/wiki/Priority_queue)
implemented as a binary heap.
We have the following operations:
* newheap() - returns a new heap
* heappush(heap, value) - place a new value into the heap
* heappop(heap) - remove and return the smallest value currently on the heap. Error if heap is empty.
* heapempty(heap) - return True if the heap has no elements, else False.
We'll use the following implementation of these:
```python
def heapnew():
return []
def heapempty(heap):
return not heap
def heappush(heap, value):
heap.append(value)
index = len(heap) - 1
while index > 0:
parent = (index - 1) // 2
if heap[parent] > heap[index]:
heap[parent], heap[index] = heap[index], heap[parent]
index = parent
else:
break
def heappop(heap):
return heap.pop(0)
```
(Note that this implementation is *wrong*. heappop as implemented will return the smallest element
if the heap currently satisfies the heap property, but it will not rebalance the heap afterwards
so it may leave the heap in an invalid state)
We could test this readily enough using @given with something like the following:
```python
from hypothesis import given
from hypothesis.strategies import integers, lists
@given(lists(integers()))
def test_pop_in_sorted_order(ls):
h = heapnew()
for l in ls:
heappush(h, l)
r = []
while not heapempty(h):
r.append(heappop(h))
assert r == sorted(ls)
```
And this indeed finds the bug:
```
> assert r == sorted(ls)
E assert [0, 1, 0] == [0, 0, 1]
E At index 1 diff: 1 != 0
E Use -v to get the full diff
binheap.py:74: AssertionError
----- Hypothesis -----
Falsifying example: test_pop_in_sorted_order(ls=[0, 1, 0])
```
So we replace heappop with a correct implementation which rebalances the heap:
```python
def heappop(heap):
if len(heap) == 0:
raise ValueError("Empty heap")
if len(heap) == 1:
return heap.pop()
result = heap[0]
heap[0] = heap.pop()
index = 0
while index * 2 + 1 < len(heap):
children = [index * 2 + 1, index * 2 + 2]
children = [i for i in children if i < len(heap)]
assert children
children.sort(key=lambda x: heap[x])
for c in children:
if heap[index] > heap[c]:
heap[index], heap[c] = heap[c], heap[index]
index = c
break
else:
break
return result
```
But how do we know this is enough? Might some combination of mixing pushes and pops break the
invariants of the heap in a way that this simple pattern of pushing everything then popping
everything cannot witness?
This is where the rule based state machines come in. Instead of just letting Hypothesis give
us data which we feed into a fixed structure of test, we let Hypothesis choose which operations
to perform on our data structure:
```python
from hypothesis.stateful import RuleBasedStateMachine, precondition, rule
class HeapMachine(RuleBasedStateMachine):
def __init__(self):
super().__init__()
self.heap = []
@rule(value=integers())
def push(self, value):
heappush(self.heap, value)
@rule()
@precondition(lambda self: self.heap)
def pop(self):
correct = min(self.heap)
result = heappop(self.heap)
assert correct == result
```
@rule is a slightly restricted version of @given that only works for methods on a RuleBasedStateMachine.
However it has one *major* difference from @given, which is that multiple rules can be chained together:
A test using this state machine doesn't just run each rule in isolation, it instantiates an instance of
the machine and then runs multiple rules in succession.
The @precondition decorator constrains when a rule is allowed to fire: We are not allowed to pop from
an empty heap, so the pop rule may only fire when there is data to be popped.
We can run this by getting a standard unit test TestCase object out of it to be picked up by unittest
or py.test as normal:
```python
TestHeaps = HeapMachine.TestCase
```
With our original broken heappop we find the same bug as before:
```
E AssertionError: assert 0 == 1
binheap.py:90: AssertionError
----- Captured stdout call -----
Step #1: push(value=1)
Step #2: push(value=0)
Step #3: push(value=0)
Step #4: pop()
Step #5: pop()
```
With the fixed implementation the test passes.
As it currently stands, this is already very useful. It's particularly good for testing single standalone
objects or services like storage systems.
But one limitation of it as we have written it is that it only concerns ourselves with a single heap. What
if we wanted to combine two heaps? For example, suppose we wanted a heap merging operation that takes two
heaps and returns a new heap containing the values in either of the original two.
As before, we'll start with a broken implementation:
```python
def heapmerge(x, y):
x, y = sorted((x, y))
return x + y
```
We can't just write a strategy for heaps, because each heap would be a fresh object and thus it would not
preserve the stateful aspect.
What we instead do is use the other big feature of Hypothesis's rule bases state machines: Bundles.
Bundles allow rules to return as well as accept values. A bundle is a strategy which generates anything
a rule has previously provided to it. Using them is as follows:
```python
class HeapMachine(RuleBasedStateMachine):
Heaps = Bundle("heaps")
@rule(target=Heaps)
def newheap(self):
return []
@rule(heap=Heaps, value=integers())
def push(self, heap, value):
heappush(heap, value)
@rule(heap=Heaps.filter(bool))
def pop(self, heap):
correct = min(heap)
result = heappop(heap)
assert correct == result
@rule(target=Heaps, heap1=Heaps, heap2=Heaps)
def merge(self, heap1, heap2):
return heapmerge(heap1, heap2)
```
So now instead of a single heap we manage a collection of heaps. All of our previous operations become
constrained by an instance of a heap.
Note the use of filter: A bundle is a strategy you can use like any other. In this case the filter replaces
our use of a precondition because we now only care about whether this *specific* heap is empty.
This is sufficient to find the fact that our implementation is wrong:
```
@rule(heap=Heaps.filter(bool))
def pop(self, heap):
correct = min(heap)
result = heappop(heap)
> assert correct == result
E AssertionError: assert 0 == 1
binheap.py:105: AssertionError
----- Captured stdout call -----
Step #1: v1 = newheap()
Step #2: push(heap=v1, value=0)
Step #3: push(heap=v1, value=1)
Step #4: push(heap=v1, value=1)
Step #5: v2 = merge(heap2=v1, heap1=v1)
Step #6: pop(heap=v2)
Step #7: pop(heap=v2)
```
We create a small heap, merge it with itself, and rapidly discover that it has become unbalanced.
We can fix this by fixing our heapmerge to be correct:
```python
def heapmerge(x, y):
result = list(x)
for v in y:
heappush(result, v)
return result
```
But that's boring. Lets introduce a more *interestingly* broken implementation instead:
```python
def heapmerge(x, y):
result = []
i = 0
j = 0
while i < len(x) and j < len(y):
if x[i] <= y[j]:
result.append(x[i])
i += 1
else:
result.append(y[j])
j += 1
result.extend(x[i:])
result.extend(y[j:])
return result
```
This merge operation selectively splices two heaps together as if we were merging two
sorted lists (heaps aren't actually sorted, but the code still works regardless it
just doesn't do anything very meaningful).
This is wrong, but it turns out to work surprisingly well for small heaps and it's
not completely straightforward to find an example showing that it's wrong.
Here's what Hypothesis comes up with:
```
Step #1: v1 = newheap()
Step #2: push(heap=v1, value=0)
Step #3: v2 = merge(heap1=v1, heap2=v1)
Step #4: v3 = merge(heap1=v2, heap2=v2)
Step #5: push(heap=v3, value=-1)
Step #6: v4 = merge(heap1=v1, heap2=v2)
Step #7: pop(heap=v4)
Step #8: push(heap=v3, value=-1)
Step #9: v5 = merge(heap1=v1, heap2=v2)
Step #10: v6 = merge(heap1=v5, heap2=v4)
Step #11: v7 = merge(heap1=v6, heap2=v3)
Step #12: pop(heap=v7)
Step #13: pop(heap=v7)
```
Through a careful set of heap creation and merging, Hypothesis manages to find a series
of merges that produce an unbalanced heap. Every heap prior to v7 is balanced, but v7 looks
like this:
```
>>> v7
[-1, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0]
```
Which doesn't satisfy the heap property because of that -1 far down in the list.
I don't know about you, but I would never have come up with that example. There's probably
a simpler one given a different set of operations - e.g. one thing that would probably improve
the quality of this test is to let Hypothesis instantiate a new heap with a list of elements
which it pops onto it.
But the nice thing about rule based stateful testing is that I don't *have* to come up with
the example. Instead Hypothesis is able to guarantee that every combination of operations
on my objects works, and can flush out some remarkably subtle bugs in the process.
Because after all, if it takes this complicated an example to demonstrate that a completely
wrong implementation is wrong, how hard can it sometimes be to demonstrate subtle bugs?
### Real world usage
This feature is currently somewhat under-documented so hasn't seen as widespread adoption as
it could. However, there are at least two interesting real world examples:
1. Hypothesis uses it to test itself. Hypothesis has [tests of its example database](
https://github.com/HypothesisWorks/hypothesis-python/blob/master/tests/cover/test_database_agreement.py)
which work very much like the above, and [a small model of its test API](
https://github.com/HypothesisWorks/hypothesis-python/blob/master/tests/nocover/test_strategy_state.py)
which generates random strategies and runs tests using them.
2. It's being used to [test Mercurial](https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-February/080037.html)
generating random. So far it's found [bug 5112](https://bz.mercurial-scm.org/show_bug.cgi?id=5112) and
[bug 5113](https://bz.mercurial-scm.org/show_bug.cgi?id=5113). The usage pattern on Mercurial is one
such that the stateful testing probably needs more resources, more rules and more work on deployment
before it's going to find much more than that though.
|