1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130
|
.. currentmodule:: altair
.. _user-guide-impute-transform:
Impute
~~~~~~
The impute transform allows you to fill-in missing entries in a dataset.
As an example, consider the following data, which includes missing values
that we filter-out of the long-form representation (see :ref:`data-long-vs-wide`
for more on this):
.. altair-plot::
:output: repr
import numpy as np
import pandas as pd
data = pd.DataFrame({
't': range(5),
'x': [2, np.nan, 3, 1, 3],
'y': [5, 7, 5, np.nan, 4]
}).melt('t').dropna()
data
Notice the result: the ``x`` series has no entry at ``t=1``, and the ``y``
series has a missing entry at ``t=3``. If we use Altair to visualize this
data directly, the line skips the missing entries:
.. altair-plot::
import altair as alt
raw = alt.Chart(data).mark_line(point=True).encode(
x='t:Q',
y='value:Q',
color='variable:N'
)
raw
This is not always desirable, because (particularly for a line plot with
no points) it can imply the existence of data that is not there.
Impute via Encodings
^^^^^^^^^^^^^^^^^^^^
To address this, you can use the impute method of the encoding channel.
For example, we can impute using a constant value (we'll show the raw chart
lightly in the background for reference):
.. altair-plot::
background = raw.encode(opacity=alt.value(0.2))
chart = alt.Chart(data).mark_line(point=True).encode(
x='t:Q',
y=alt.Y('value:Q').impute(value=0),
color='variable:N'
)
background + chart
Or we can impute using any supported aggregate:
.. altair-plot::
chart = alt.Chart(data).mark_line(point=True).encode(
x='t:Q',
y=alt.Y('value:Q').impute(method='mean'),
color='variable:N'
)
background + chart
Impute via Transform
^^^^^^^^^^^^^^^^^^^^
Similar to the :ref:`user-guide-bin-transform` and :ref:`user-guide-aggregate-transform`,
it is also possible to specify the impute transform outside the encoding as a
transform. For example, here is the equivalent of the above two charts:
.. altair-plot::
chart = alt.Chart(data).transform_impute(
impute='value',
key='t',
value=0,
groupby=['variable']
).mark_line(point=True).encode(
x='t:Q',
y='value:Q',
color='variable:N'
)
background + chart
.. altair-plot::
chart = alt.Chart(data).transform_impute(
impute='value',
key='t',
method='mean',
groupby=['variable']
).mark_line(point=True).encode(
x='t:Q',
y='value:Q',
color='variable:N'
)
background + chart
If you would like to use more localized imputed values, you can specify a
``frame`` parameter similar to the :ref:`user-guide-window-transform` that
will control which values are used for the imputation. For example, here
we impute missing values using the mean of the neighboring points on either
side:
.. altair-plot::
chart = alt.Chart(data).transform_impute(
impute='value',
key='t',
method='mean',
frame=[-1, 1],
groupby=['variable']
).mark_line(point=True).encode(
x='t:Q',
y='value:Q',
color='variable:N'
)
background + chart
Transform Options
^^^^^^^^^^^^^^^^^
The :meth:`~Chart.transform_impute` method is built on the :class:`~ImputeTransform`
class, which has the following options:
.. altair-object-table:: altair.ImputeTransform
|