1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304
|
Integrations Reference
======================
Reference for Hypothesis features with a defined interface, but no code API.
.. _ghostwriter:
Ghostwriter
-----------
.. automodule:: hypothesis.extra.ghostwriter
:members:
A note for test-generation researchers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ghostwritten tests are intended as a *starting point for human authorship*,
to demonstrate best practice, help novices past blank-page paralysis, and save time
for experts. They *may* be ready-to-run, or include placeholders and ``# TODO:``
comments to fill in strategies for unknown types. In either case, improving tests
for their own code gives users a well-scoped and immediately rewarding context in
which to explore property-based testing.
By contrast, most test-generation tools aim to produce ready-to-run test suites...
and implicitly assume that the current behavior is the desired behavior.
However, the code might contain bugs, and we want our tests to fail if it does!
Worse, tools require that the code to be tested is finished and executable,
making it impossible to generate tests as part of the development process.
`Fraser 2013`_ found that evolving a high-coverage test suite (e.g. Randoop_, EvoSuite_, Pynguin_)
"leads to clear improvements in commonly applied quality metrics such as code coverage
[but] no measurable improvement in the number of bugs actually found by developers"
and that "generating a set of test cases, even high coverage test cases,
does not necessarily improve our ability to test software".
Invariant detection (famously Daikon_; in PBT see e.g. `Alonso 2022`_,
QuickSpec_, Speculate_) relies on code execution. Program slicing (e.g. FUDGE_,
FuzzGen_, WINNIE_) requires downstream consumers of the code to test.
Ghostwriter inspects the function name, argument names and types, and docstrings.
It can be used on buggy or incomplete code, runs in a few seconds, and produces
a single semantically-meaningful test per function or group of functions.
Rather than detecting regressions, these tests check semantic properties such as
`encode/decode or save/load round-trips <https://zhd.dev/ghostwriter/?q=gzip.compress>`__,
for `commutative, associative, and distributive operations
<https://zhd.dev/ghostwriter/?q=operator.mul>`__,
`equivalence between methods <https://zhd.dev/ghostwriter/?q=operator.add+numpy.add>`__,
`array shapes <https://zhd.dev/ghostwriter/?q=numpy.matmul>`__,
and idempotence. Where no property is detected, we simply check for
'no error on valid input' and allow the user to supply their own invariants.
Evaluations such as the SBFT24_ competition_ measure performance on a task which
the Ghostwriter is not intended to perform. I'd love to see qualitative user
studies, such as `PBT in Practice`_ for test generation, which could check
whether the Ghostwriter is onto something or tilting at windmills.
If you're interested in similar questions, `drop me an email`_!
.. _Daikon: https://plse.cs.washington.edu/daikon/pubs/
.. _Alonso 2022: https://doi.org/10.1145/3540250.3559080
.. _QuickSpec: http://www.cse.chalmers.se/~nicsma/papers/quickspec2.pdf
.. _Speculate: https://matela.com.br/speculate.pdf
.. _FUDGE: https://research.google/pubs/pub48314/
.. _FuzzGen: https://www.usenix.org/conference/usenixsecurity20/presentation/ispoglou
.. _WINNIE: https://www.ndss-symposium.org/wp-content/uploads/2021-334-paper.pdf
.. _Fraser 2013: https://doi.org/10.1145/2483760.2483774
.. _Randoop: https://homes.cs.washington.edu/~mernst/pubs/feedback-testgen-icse2007.pdf
.. _EvoSuite: https://www.evosuite.org/wp-content/papercite-data/pdf/esecfse11.pdf
.. _Pynguin: https://arxiv.org/abs/2007.14049
.. _SBFT24: https://arxiv.org/abs/2401.15189
.. _competition: https://github.com/ThunderKey/python-tool-competition-2024
.. _PBT in Practice: https://harrisongoldste.in/papers/icse24-pbt-in-practice.pdf
.. _drop me an email: mailto:zac@zhd.dev?subject=Hypothesis%20Ghostwriter%20research
.. _observability:
Observability
-------------
.. note::
The `Tyche <https://github.com/tyche-pbt/tyche-extension>`__ VSCode extension provides an in-editor UI for observability results generated by Hypothesis. If you want to *view* observability results, rather than programmatically consume or display them, we recommend using Tyche.
.. warning::
This feature is experimental, and could have breaking changes or even be removed
without notice. Try it out, let us know what you think, but don't rely on it
just yet!
Motivation
~~~~~~~~~~
Understanding what your code is doing - for example, why your test failed - is often
a frustrating exercise in adding some more instrumentation or logging (or ``print()`` calls)
and running it again. The idea of :wikipedia:`observability <Observability_(software)>`
is to let you answer questions you didn't think of in advance. In slogan form,
*Debugging should be a data analysis problem.*
By default, Hypothesis only reports the minimal failing example... but sometimes you might
want to know something about *all* the examples. Printing them to the terminal by increasing
|Verbosity| might be nice, but isn't always enough.
This feature gives you an analysis-ready dataframe with useful columns and one row
per test case, with columns from arguments to code coverage to pass/fail status.
This is deliberately a much lighter-weight and task-specific system than e.g.
`OpenTelemetry <https://opentelemetry.io/>`__. It's also less detailed than time-travel
debuggers such as `rr <https://rr-project.org/>`__ or `pytrace <https://pytrace.com/>`__,
because there's no good way to compare multiple traces from these tools and their
Python support is relatively immature.
Configuration
~~~~~~~~~~~~~
If you set the ``HYPOTHESIS_EXPERIMENTAL_OBSERVABILITY`` environment variable,
Hypothesis will log various observations to jsonlines files in the
``.hypothesis/observed/`` directory. You can load and explore these with e.g.
:func:`pd.read_json(".hypothesis/observed/*_testcases.jsonl", lines=True) <pandas.read_json>`,
or by using the :pypi:`sqlite-utils` and :pypi:`datasette` libraries::
sqlite-utils insert testcases.db testcases .hypothesis/observed/*_testcases.jsonl --nl --flatten
datasette serve testcases.db
If you are experiencing a significant slow-down, you can try setting
``HYPOTHESIS_EXPERIMENTAL_OBSERVABILITY_NOCOVER`` instead; this will disable coverage information
collection. This should not be necessary on Python 3.12 or later, where coverage collection is very fast.
Collecting more information
^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you want to record more information about your test cases than the arguments and
outcome - for example, was ``x`` a binary tree? what was the difference between the
expected and the actual value? how many queries did it take to find a solution? -
Hypothesis makes this easy.
:func:`~hypothesis.event` accepts a string label, and optionally a string or int or
float observation associated with it. All events are collected and summarized in
:ref:`statistics`, as well as included on a per-test-case basis in our observations.
:func:`~hypothesis.target` is a special case of numeric-valued events: as well as
recording them in observations, Hypothesis will try to maximize the targeted value.
Knowing that, you can use this to guide the search for failing inputs.
Data Format
~~~~~~~~~~~
We dump observations in `json lines format <https://jsonlines.org/>`__, with each line
describing either a test case or an information message. The tables below are derived
from :download:`this machine-readable JSON schema <schema_observations.json>`, to
provide both readable and verifiable specifications.
Note that we use :func:`python:json.dumps` and can therefore emit non-standard JSON
which includes infinities and NaN. This is valid in `JSON5 <https://json5.org/>`__,
and supported by `some JSON parsers <https://evanhahn.com/pythons-nonstandard-json-encoding/>`__
including Gson in Java, ``JSON.parse()`` in Ruby, and of course in Python.
Information message
^^^^^^^^^^^^^^^^^^^
.. jsonschema:: ./schema_observations.json#/oneOf/1
:hide_key: /additionalProperties, /type
Test case
^^^^^^^^^
.. jsonschema:: ./schema_observations.json#/oneOf/0
:hide_key: /additionalProperties, /type
.. _observability-hypothesis-metadata:
Hypothesis metadata
+++++++++++++++++++
While the observability format is agnostic to the property-based testing library which generated it, Hypothesis includes specific values in the ``metadata`` key for test cases. You may rely on these being present if and only if the observation was generated by Hypothesis.
.. jsonschema:: ./schema_metadata.json
:hide_key: /additionalProperties, /type
Choices metadata
++++++++++++++++
These additional metadata elements are included in ``metadata`` (as e.g. ``metadata["choice_nodes"]`` or ``metadata["choice_spans"]``), if and only if |OBSERVABILITY_CHOICES| is set.
.. jsonschema:: ./schema_metadata_choices.json
:hide_key: /additionalProperties, /type
.. _pytest-plugin:
The Hypothesis pytest plugin
----------------------------
Hypothesis includes a tiny plugin to improve integration with :pypi:`pytest`, which is activated by default (but does not affect other test runners). It aims to improve the integration between Hypothesis and Pytest by providing extra information and convenient access to config options.
- ``pytest --hypothesis-show-statistics`` can be used to :ref:`display test and data generation statistics <statistics>`.
- ``pytest --hypothesis-profile=<profile name>`` can be used to load a settings profile (as in |settings.load_profile|).
- ``pytest --hypothesis-verbosity=<level name>`` can be used to override the current |Verbosity| setting.
- ``pytest --hypothesis-seed=<an int>`` can be used to reproduce a failure with a particular seed (as in |@seed|).
- ``pytest --hypothesis-explain`` can be used to temporarily enable |Phase.explain|.
Finally, all tests that are defined with Hypothesis automatically have ``@pytest.mark.hypothesis`` applied to them. See :ref:`here for information on working with markers <pytest:mark examples>`.
.. note::
Pytest will load the plugin automatically if Hypothesis is installed. You don't need to do anything at all to use it.
If this causes problems, you can avoid loading the plugin with the ``-p no:hypothesispytest`` option.
.. _statistics:
Test statistics
~~~~~~~~~~~~~~~
.. note::
While test statistics are only available under pytest, you can use the :ref:`observability <observability>` interface to view similar information about your tests.
You can see a number of statistics about executed tests by passing the command line argument ``--hypothesis-show-statistics``. This will include some general statistics about the test:
For example if you ran the following with ``--hypothesis-show-statistics``:
.. code-block:: python
from hypothesis import given, strategies as st
@given(st.integers())
def test_integers(i):
pass
You would see:
.. code-block:: none
- during generate phase (0.06 seconds):
- Typical runtimes: < 1ms, ~ 47% in data generation
- 100 passing examples, 0 failing examples, 0 invalid examples
- Stopped because settings.max_examples=100
The final "Stopped because" line tells you why Hypothesis stopped generating new examples. This is typically because we hit |max_examples|, but occasionally because we exhausted the search space or because shrinking was taking a very long time. This can be useful for understanding the behaviour of your tests.
In some cases (such as filtered and recursive strategies) you will see events mentioned which describe some aspect of the data generation:
.. code-block:: python
from hypothesis import given, strategies as st
@given(st.integers().filter(lambda x: x % 2 == 0))
def test_even_integers(i):
pass
You would see something like:
.. code-block:: none
test_even_integers:
- during generate phase (0.08 seconds):
- Typical runtimes: < 1ms, ~ 57% in data generation
- 100 passing examples, 0 failing examples, 12 invalid examples
- Events:
* 51.79%, Retried draw from integers().filter(lambda x: x % 2 == 0) to satisfy filter
* 10.71%, Aborted test because unable to satisfy integers().filter(lambda x: x % 2 == 0)
- Stopped because settings.max_examples=100
.. _hypothesis-cli:
hypothesis[cli]
----------------
.. note::
This feature requires the ``hypothesis[cli]`` :doc:`extra </extras>`, via ``pip install hypothesis[cli]``.
.. automodule:: hypothesis.extra.cli
.. _codemods:
hypothesis[codemods]
--------------------
.. note::
This feature requires the ``hypothesis[codemods]`` :doc:`extra </extras>`, via ``pip install hypothesis[codemods]``.
.. automodule:: hypothesis.extra.codemods
.. _hypothesis-dpcontracts:
hypothesis[dpcontracts]
-----------------------
.. note::
This feature requires the ``hypothesis[dpcontracts]`` :doc:`extra </extras>`, via ``pip install hypothesis[dpcontracts]``.
.. tip::
For new projects, we recommend using either :pypi:`deal` or :pypi:`icontract`
and :pypi:`icontract-hypothesis` over :pypi:`dpcontracts`.
They're generally more powerful tools for design-by-contract programming,
and have substantially nicer Hypothesis integration too!
.. automodule:: hypothesis.extra.dpcontracts
:members:
|