1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291
|
.. _reshape:
###############################
Reshaping and reorganizing data
###############################
These methods allow you to reorganize
.. ipython:: python
:suppress:
import numpy as np
import pandas as pd
import xarray as xr
np.random.seed(123456)
Reordering dimensions
---------------------
To reorder dimensions on a :py:class:`~xarray.DataArray` or across all variables
on a :py:class:`~xarray.Dataset`, use :py:meth:`~xarray.DataArray.transpose`. An
ellipsis (`...`) can be use to represent all other dimensions:
.. ipython:: python
ds = xr.Dataset({"foo": (("x", "y", "z"), [[[42]]]), "bar": (("y", "z"), [[24]])})
ds.transpose("y", "z", "x")
ds.transpose(..., "x") # equivalent
ds.transpose() # reverses all dimensions
Expand and squeeze dimensions
-----------------------------
To expand a :py:class:`~xarray.DataArray` or all
variables on a :py:class:`~xarray.Dataset` along a new dimension,
use :py:meth:`~xarray.DataArray.expand_dims`
.. ipython:: python
expanded = ds.expand_dims("w")
expanded
This method attaches a new dimension with size 1 to all data variables.
To remove such a size-1 dimension from the :py:class:`~xarray.DataArray`
or :py:class:`~xarray.Dataset`,
use :py:meth:`~xarray.DataArray.squeeze`
.. ipython:: python
expanded.squeeze("w")
Converting between datasets and arrays
--------------------------------------
To convert from a Dataset to a DataArray, use :py:meth:`~xarray.Dataset.to_array`:
.. ipython:: python
arr = ds.to_array()
arr
This method broadcasts all data variables in the dataset against each other,
then concatenates them along a new dimension into a new array while preserving
coordinates.
To convert back from a DataArray to a Dataset, use
:py:meth:`~xarray.DataArray.to_dataset`:
.. ipython:: python
arr.to_dataset(dim="variable")
The broadcasting behavior of ``to_array`` means that the resulting array
includes the union of data variable dimensions:
.. ipython:: python
ds2 = xr.Dataset({"a": 0, "b": ("x", [3, 4, 5])})
# the input dataset has 4 elements
ds2
# the resulting array has 6 elements
ds2.to_array()
Otherwise, the result could not be represented as an orthogonal array.
If you use ``to_dataset`` without supplying the ``dim`` argument, the DataArray will be converted into a Dataset of one variable:
.. ipython:: python
arr.to_dataset(name="combined")
.. _reshape.stack:
Stack and unstack
-----------------
As part of xarray's nascent support for :py:class:`pandas.MultiIndex`, we have
implemented :py:meth:`~xarray.DataArray.stack` and
:py:meth:`~xarray.DataArray.unstack` method, for combining or splitting dimensions:
.. ipython:: python
array = xr.DataArray(
np.random.randn(2, 3), coords=[("x", ["a", "b"]), ("y", [0, 1, 2])]
)
stacked = array.stack(z=("x", "y"))
stacked
stacked.unstack("z")
As elsewhere in xarray, an ellipsis (`...`) can be used to represent all unlisted dimensions:
.. ipython:: python
stacked = array.stack(z=[..., "x"])
stacked
These methods are modeled on the :py:class:`pandas.DataFrame` methods of the
same name, although in xarray they always create new dimensions rather than
adding to the existing index or columns.
Like :py:meth:`DataFrame.unstack<pandas.DataFrame.unstack>`, xarray's ``unstack``
always succeeds, even if the multi-index being unstacked does not contain all
possible levels. Missing levels are filled in with ``NaN`` in the resulting object:
.. ipython:: python
stacked2 = stacked[::2]
stacked2
stacked2.unstack("z")
However, xarray's ``stack`` has an important difference from pandas: unlike
pandas, it does not automatically drop missing values. Compare:
.. ipython:: python
array = xr.DataArray([[np.nan, 1], [2, 3]], dims=["x", "y"])
array.stack(z=("x", "y"))
array.to_pandas().stack()
We departed from pandas's behavior here because predictable shapes for new
array dimensions is necessary for :ref:`dask`.
.. _reshape.stacking_different:
Stacking different variables together
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
These stacking and unstacking operations are particularly useful for reshaping
xarray objects for use in machine learning packages, such as `scikit-learn
<http://scikit-learn.org/stable/>`_, that usually require two-dimensional numpy
arrays as inputs. For datasets with only one variable, we only need ``stack``
and ``unstack``, but combining multiple variables in a
:py:class:`xarray.Dataset` is more complicated. If the variables in the dataset
have matching numbers of dimensions, we can call
:py:meth:`~xarray.Dataset.to_array` and then stack along the the new coordinate.
But :py:meth:`~xarray.Dataset.to_array` will broadcast the dataarrays together,
which will effectively tile the lower dimensional variable along the missing
dimensions. The method :py:meth:`xarray.Dataset.to_stacked_array` allows
combining variables of differing dimensions without this wasteful copying while
:py:meth:`xarray.DataArray.to_unstacked_dataset` reverses this operation.
Just as with :py:meth:`xarray.Dataset.stack` the stacked coordinate is
represented by a :py:class:`pandas.MultiIndex` object. These methods are used
like this:
.. ipython:: python
data = xr.Dataset(
data_vars={"a": (("x", "y"), [[0, 1, 2], [3, 4, 5]]), "b": ("x", [6, 7])},
coords={"y": ["u", "v", "w"]},
)
data
stacked = data.to_stacked_array("z", sample_dims=["x"])
stacked
unstacked = stacked.to_unstacked_dataset("z")
unstacked
In this example, ``stacked`` is a two dimensional array that we can easily pass to a scikit-learn or another generic
numerical method.
.. note::
Unlike with ``stack``, in ``to_stacked_array``, the user specifies the dimensions they **do not** want stacked.
For a machine learning task, these unstacked dimensions can be interpreted as the dimensions over which samples are
drawn, whereas the stacked coordinates are the features. Naturally, all variables should possess these sampling
dimensions.
.. _reshape.set_index:
Set and reset index
-------------------
Complementary to stack / unstack, xarray's ``.set_index``, ``.reset_index`` and
``.reorder_levels`` allow easy manipulation of ``DataArray`` or ``Dataset``
multi-indexes without modifying the data and its dimensions.
You can create a multi-index from several 1-dimensional variables and/or
coordinates using :py:meth:`~xarray.DataArray.set_index`:
.. ipython:: python
da = xr.DataArray(
np.random.rand(4),
coords={
"band": ("x", ["a", "a", "b", "b"]),
"wavenumber": ("x", np.linspace(200, 400, 4)),
},
dims="x",
)
da
mda = da.set_index(x=["band", "wavenumber"])
mda
These coordinates can now be used for indexing, e.g.,
.. ipython:: python
mda.sel(band="a")
Conversely, you can use :py:meth:`~xarray.DataArray.reset_index`
to extract multi-index levels as coordinates (this is mainly useful
for serialization):
.. ipython:: python
mda.reset_index("x")
:py:meth:`~xarray.DataArray.reorder_levels` allows changing the order
of multi-index levels:
.. ipython:: python
mda.reorder_levels(x=["wavenumber", "band"])
As of xarray v0.9 coordinate labels for each dimension are optional.
You can also use ``.set_index`` / ``.reset_index`` to add / remove
labels for one or several dimensions:
.. ipython:: python
array = xr.DataArray([1, 2, 3], dims="x")
array
array["c"] = ("x", ["a", "b", "c"])
array.set_index(x="c")
array = array.set_index(x="c")
array = array.reset_index("x", drop=True)
.. _reshape.shift_and_roll:
Shift and roll
--------------
To adjust coordinate labels, you can use the :py:meth:`~xarray.Dataset.shift` and
:py:meth:`~xarray.Dataset.roll` methods:
.. ipython:: python
array = xr.DataArray([1, 2, 3, 4], dims="x")
array.shift(x=2)
array.roll(x=2, roll_coords=True)
.. _reshape.sort:
Sort
----
One may sort a DataArray/Dataset via :py:meth:`~xarray.DataArray.sortby` and
:py:meth:`~xarray.DataArray.sortby`. The input can be an individual or list of
1D ``DataArray`` objects:
.. ipython:: python
ds = xr.Dataset(
{"A": (("x", "y"), [[1, 2], [3, 4]]), "B": (("x", "y"), [[5, 6], [7, 8]])},
coords={"x": ["b", "a"], "y": [1, 0]},
)
dax = xr.DataArray([100, 99], [("x", [0, 1])])
day = xr.DataArray([90, 80], [("y", [0, 1])])
ds.sortby([day, dax])
As a shortcut, you can refer to existing coordinates by name:
.. ipython:: python
ds.sortby("x")
ds.sortby(["y", "x"])
ds.sortby(["y", "x"], ascending=False)
|