1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151
|
.. currentmodule:: pandas
{{ header }}
.. _integer_na:
**************************
Nullable integer data type
**************************
.. note::
IntegerArray is currently experimental. Its API or implementation may
change without warning.
.. versionchanged:: 1.0.0
Now uses :attr:`pandas.NA` as the missing value rather
than :attr:`numpy.nan`.
In :ref:`missing_data`, we saw that pandas primarily uses ``NaN`` to represent
missing data. Because ``NaN`` is a float, this forces an array of integers with
any missing values to become floating point. In some cases, this may not matter
much. But if your integer column is, say, an identifier, casting to float can
be problematic. Some integers cannot even be represented as floating point
numbers.
Construction
------------
pandas can represent integer data with possibly missing values using
:class:`arrays.IntegerArray`. This is an :ref:`extension type <extending.extension-types>`
implemented within pandas.
.. ipython:: python
arr = pd.array([1, 2, None], dtype=pd.Int64Dtype())
arr
Or the string alias ``"Int64"`` (note the capital ``"I"``, to differentiate from
NumPy's ``'int64'`` dtype:
.. ipython:: python
pd.array([1, 2, np.nan], dtype="Int64")
All NA-like values are replaced with :attr:`pandas.NA`.
.. ipython:: python
pd.array([1, 2, np.nan, None, pd.NA], dtype="Int64")
This array can be stored in a :class:`DataFrame` or :class:`Series` like any
NumPy array.
.. ipython:: python
pd.Series(arr)
You can also pass the list-like object to the :class:`Series` constructor
with the dtype.
.. warning::
Currently :meth:`pandas.array` and :meth:`pandas.Series` use different
rules for dtype inference. :meth:`pandas.array` will infer a nullable-
integer dtype
.. ipython:: python
pd.array([1, None])
pd.array([1, 2])
For backwards-compatibility, :class:`Series` infers these as either
integer or float dtype
.. ipython:: python
pd.Series([1, None])
pd.Series([1, 2])
We recommend explicitly providing the dtype to avoid confusion.
.. ipython:: python
pd.array([1, None], dtype="Int64")
pd.Series([1, None], dtype="Int64")
In the future, we may provide an option for :class:`Series` to infer a
nullable-integer dtype.
Operations
----------
Operations involving an integer array will behave similar to NumPy arrays.
Missing values will be propagated, and the data will be coerced to another
dtype if needed.
.. ipython:: python
s = pd.Series([1, 2, None], dtype="Int64")
# arithmetic
s + 1
# comparison
s == 1
# indexing
s.iloc[1:3]
# operate with other dtypes
s + s.iloc[1:3].astype("Int8")
# coerce when needed
s + 0.01
These dtypes can operate as part of ``DataFrame``.
.. ipython:: python
df = pd.DataFrame({"A": s, "B": [1, 1, 3], "C": list("aab")})
df
df.dtypes
These dtypes can be merged & reshaped & casted.
.. ipython:: python
pd.concat([df[["A"]], df[["B", "C"]]], axis=1).dtypes
df["A"].astype(float)
Reduction and groupby operations such as 'sum' work as well.
.. ipython:: python
df.sum()
df.groupby("B").A.sum()
Scalar NA Value
---------------
:class:`arrays.IntegerArray` uses :attr:`pandas.NA` as its scalar
missing value. Slicing a single element that's missing will return
:attr:`pandas.NA`
.. ipython:: python
a = pd.array([1, None], dtype="Int64")
a[1]
|