1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
|
.. currentmodule:: pandas
.. ipython:: python
:suppress:
import pandas as pd
import numpy as np
.. _boolean:
**************************
Nullable Boolean data type
**************************
.. note::
BooleanArray is currently experimental. Its API or implementation may
change without warning.
.. versionadded:: 1.0.0
.. _boolean.indexing:
Indexing with NA values
-----------------------
pandas allows indexing with ``NA`` values in a boolean array, which are treated as ``False``.
.. versionchanged:: 1.0.2
.. ipython:: python
:okexcept:
s = pd.Series([1, 2, 3])
mask = pd.array([True, False, pd.NA], dtype="boolean")
s[mask]
If you would prefer to keep the ``NA`` values you can manually fill them with ``fillna(True)``.
.. ipython:: python
s[mask.fillna(True)]
.. _boolean.kleene:
Kleene logical operations
-------------------------
:class:`arrays.BooleanArray` implements `Kleene Logic`_ (sometimes called three-value logic) for
logical operations like ``&`` (and), ``|`` (or) and ``^`` (exclusive-or).
This table demonstrates the results for every combination. These operations are symmetrical,
so flipping the left- and right-hand side makes no difference in the result.
================= =========
Expression Result
================= =========
``True & True`` ``True``
``True & False`` ``False``
``True & NA`` ``NA``
``False & False`` ``False``
``False & NA`` ``False``
``NA & NA`` ``NA``
``True | True`` ``True``
``True | False`` ``True``
``True | NA`` ``True``
``False | False`` ``False``
``False | NA`` ``NA``
``NA | NA`` ``NA``
``True ^ True`` ``False``
``True ^ False`` ``True``
``True ^ NA`` ``NA``
``False ^ False`` ``False``
``False ^ NA`` ``NA``
``NA ^ NA`` ``NA``
================= =========
When an ``NA`` is present in an operation, the output value is ``NA`` only if
the result cannot be determined solely based on the other input. For example,
``True | NA`` is ``True``, because both ``True | True`` and ``True | False``
are ``True``. In that case, we don't actually need to consider the value
of the ``NA``.
On the other hand, ``True & NA`` is ``NA``. The result depends on whether
the ``NA`` really is ``True`` or ``False``, since ``True & True`` is ``True``,
but ``True & False`` is ``False``, so we can't determine the output.
This differs from how ``np.nan`` behaves in logical operations. pandas treated
``np.nan`` is *always false in the output*.
In ``or``
.. ipython:: python
pd.Series([True, False, np.nan], dtype="object") | True
pd.Series([True, False, np.nan], dtype="boolean") | True
In ``and``
.. ipython:: python
pd.Series([True, False, np.nan], dtype="object") & True
pd.Series([True, False, np.nan], dtype="boolean") & True
.. _Kleene Logic: https://en.wikipedia.org/wiki/Three-valued_logic#Kleene_and_Priest_logics
|