1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347
|
.. _time-series:
================
Time series data
================
A major use case for xarray is multi-dimensional time-series data.
Accordingly, we've copied many of features that make working with time-series
data in pandas such a joy to xarray. In most cases, we rely on pandas for the
core functionality.
.. ipython:: python
:suppress:
import numpy as np
import pandas as pd
import xarray as xr
np.random.seed(123456)
Creating datetime64 data
------------------------
xarray uses the numpy dtypes ``datetime64[ns]`` and ``timedelta64[ns]`` to
represent datetime data, which offer vectorized (if sometimes buggy) operations
with numpy and smooth integration with pandas.
To convert to or create regular arrays of ``datetime64`` data, we recommend
using :py:func:`pandas.to_datetime` and :py:func:`pandas.date_range`:
.. ipython:: python
pd.to_datetime(['2000-01-01', '2000-02-02'])
pd.date_range('2000-01-01', periods=365)
Alternatively, you can supply arrays of Python ``datetime`` objects. These get
converted automatically when used as arguments in xarray objects:
.. ipython:: python
import datetime
xr.Dataset({'time': datetime.datetime(2000, 1, 1)})
When reading or writing netCDF files, xarray automatically decodes datetime and
timedelta arrays using `CF conventions`_ (that is, by using a ``units``
attribute like ``'days since 2000-01-01'``).
.. _CF conventions: http://cfconventions.org
.. note::
When decoding/encoding datetimes for non-standard calendars or for dates
before year 1678 or after year 2262, xarray uses the `cftime`_ library.
It was previously packaged with the ``netcdf4-python`` package under the
name ``netcdftime`` but is now distributed separately. ``cftime`` is an
:ref:`optional dependency<installing>` of xarray.
.. _cftime: https://unidata.github.io/cftime
You can manual decode arrays in this form by passing a dataset to
:py:func:`~xarray.decode_cf`:
.. ipython:: python
attrs = {'units': 'hours since 2000-01-01'}
ds = xr.Dataset({'time': ('time', [0, 1, 2, 3], attrs)})
xr.decode_cf(ds)
One unfortunate limitation of using ``datetime64[ns]`` is that it limits the
native representation of dates to those that fall between the years 1678 and
2262. When a netCDF file contains dates outside of these bounds, dates will be
returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex`
will be used for indexing. :py:class:`~xarray.CFTimeIndex` enables a subset of
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only
fully compatible with the standalone version of ``cftime`` (not the version
packaged with earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more
information.
Datetime indexing
-----------------
xarray borrows powerful indexing machinery from pandas (see :ref:`indexing`).
This allows for several useful and succinct forms of indexing, particularly for
`datetime64` data. For example, we support indexing with strings for single
items and with the `slice` object:
.. ipython:: python
time = pd.date_range('2000-01-01', freq='H', periods=365 * 24)
ds = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time})
ds.sel(time='2000-01')
ds.sel(time=slice('2000-06-01', '2000-06-10'))
You can also select a particular time by indexing with a
:py:class:`datetime.time` object:
.. ipython:: python
ds.sel(time=datetime.time(12))
For more details, read the pandas documentation.
Datetime components
-------------------
Similar `to pandas`_, the components of datetime objects contained in a
given ``DataArray`` can be quickly computed using a special ``.dt`` accessor.
.. _to pandas: http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dt-accessors
.. ipython:: python
time = pd.date_range('2000-01-01', freq='6H', periods=365 * 4)
ds = xr.Dataset({'foo': ('time', np.arange(365 * 4)), 'time': time})
ds.time.dt.hour
ds.time.dt.dayofweek
The ``.dt`` accessor works on both coordinate dimensions as well as
multi-dimensional data.
xarray also supports a notion of "virtual" or "derived" coordinates for
`datetime components`__ implemented by pandas, including "year", "month",
"day", "hour", "minute", "second", "dayofyear", "week", "dayofweek", "weekday"
and "quarter":
__ http://pandas.pydata.org/pandas-docs/stable/api.html#time-date-components
.. ipython:: python
ds['time.month']
ds['time.dayofyear']
For use as a derived coordinate, xarray adds ``'season'`` to the list of
datetime components supported by pandas:
.. ipython:: python
ds['time.season']
ds['time'].dt.season
The set of valid seasons consists of 'DJF', 'MAM', 'JJA' and 'SON', labeled by
the first letters of the corresponding months.
You can use these shortcuts with both Datasets and DataArray coordinates.
In addition, xarray supports rounding operations ``floor``, ``ceil``, and ``round``. These operations require that you supply a `rounding frequency as a string argument.`__
__ http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
.. ipython:: python
ds['time'].dt.floor('D')
.. _resampling:
Resampling and grouped operations
---------------------------------
Datetime components couple particularly well with grouped operations (see
:ref:`groupby`) for analyzing features that repeat over time. Here's how to
calculate the mean by time of day:
.. ipython:: python
:okwarning:
ds.groupby('time.hour').mean()
For upsampling or downsampling temporal resolutions, xarray offers a
:py:meth:`~xarray.Dataset.resample` method building on the core functionality
offered by the pandas method of the same name. Resample uses essentially the
same api as ``resample`` `in pandas`_.
.. _in pandas: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#up-and-downsampling
For example, we can downsample our dataset from hourly to 6-hourly:
.. ipython:: python
:okwarning:
ds.resample(time='6H')
This will create a specialized ``Resample`` object which saves information
necessary for resampling. All of the reduction methods which work with
``Resample`` objects can also be used for resampling:
.. ipython:: python
:okwarning:
ds.resample(time='6H').mean()
You can also supply an arbitrary reduction function to aggregate over each
resampling group:
.. ipython:: python
ds.resample(time='6H').reduce(np.mean)
For upsampling, xarray provides four methods: ``asfreq``, ``ffill``, ``bfill``,
and ``interpolate``. ``interpolate`` extends ``scipy.interpolate.interp1d`` and
supports all of its schemes. All of these resampling operations work on both
Dataset and DataArray objects with an arbitrary number of dimensions.
For more examples of using grouped operations on a time dimension, see
:ref:`toy weather data`.
.. _CFTimeIndex:
Non-standard calendars and dates outside the Timestamp-valid range
------------------------------------------------------------------
Through the standalone ``cftime`` library and a custom subclass of
:py:class:`pandas.Index`, xarray supports a subset of the indexing
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
dates from non-standard calendars commonly used in climate science or dates
using a standard calendar, but outside the `Timestamp-valid range`_
(approximately between years 1678 and 2262).
.. note::
As of xarray version 0.11, by default, :py:class:`cftime.datetime` objects
will be used to represent times (either in indexes, as a
:py:class:`~xarray.CFTimeIndex`, or in data arrays with dtype object) if
any of the following are true:
- The dates are from a non-standard calendar
- Any dates are outside the Timestamp-valid range.
Otherwise pandas-compatible dates from a standard calendar will be
represented with the ``np.datetime64[ns]`` data type, enabling the use of a
:py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[ns]``
and their full set of associated features.
For example, you can create a DataArray indexed by a time
coordinate with dates from a no-leap calendar and a
:py:class:`~xarray.CFTimeIndex` will automatically be used:
.. ipython:: python
from itertools import product
from cftime import DatetimeNoLeap
dates = [DatetimeNoLeap(year, month, 1) for year, month in
product(range(1, 3), range(1, 13))]
da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')
xarray also includes a :py:func:`~xarray.cftime_range` function, which enables
creating a :py:class:`~xarray.CFTimeIndex` with regularly-spaced dates. For
instance, we can create the same dates and DataArray we created above using:
.. ipython:: python
dates = xr.cftime_range(start='0001', periods=24, freq='MS', calendar='noleap')
da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')
For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
- `Partial datetime string indexing`_ using strictly `ISO 8601-format`_ partial
datetime strings:
.. ipython:: python
da.sel(time='0001')
da.sel(time=slice('0001-05', '0002-02'))
- Access of basic datetime components via the ``dt`` accessor (in this case
just "year", "month", "day", "hour", "minute", "second", "microsecond",
"season", "dayofyear", and "dayofweek"):
.. ipython:: python
da.time.dt.year
da.time.dt.month
da.time.dt.season
da.time.dt.dayofyear
da.time.dt.dayofweek
- Group-by operations based on datetime accessor attributes (e.g. by month of
the year):
.. ipython:: python
da.groupby('time.month').sum()
- Interpolation using :py:class:`cftime.datetime` objects:
.. ipython:: python
da.interp(time=[DatetimeNoLeap(1, 1, 15), DatetimeNoLeap(1, 2, 15)])
- Interpolation using datetime strings:
.. ipython:: python
da.interp(time=['0001-01-15', '0001-02-15'])
- Differentiation:
.. ipython:: python
da.differentiate('time')
- And serialization:
.. ipython:: python
da.to_netcdf('example-no-leap.nc')
xr.open_dataset('example-no-leap.nc')
.. note::
While much of the time series functionality that is possible for standard
dates has been implemented for dates from non-standard calendars, there are
still some remaining important features that have yet to be implemented,
for example:
- Resampling along the time dimension for data indexed by a
:py:class:`~xarray.CFTimeIndex` (:issue:`2191`, :issue:`2458`)
- Built-in plotting of data with :py:class:`cftime.datetime` coordinate axes
(:issue:`2164`).
For some use-cases it may still be useful to convert from
a :py:class:`~xarray.CFTimeIndex` to a :py:class:`pandas.DatetimeIndex`,
despite the difference in calendar types (e.g. to allow the use of some
forms of resample with non-standard calendars). The recommended way of
doing this is to use the built-in
:py:meth:`~xarray.CFTimeIndex.to_datetimeindex` method:
.. ipython:: python
:okwarning:
modern_times = xr.cftime_range('2000', periods=24, freq='MS', calendar='noleap')
da = xr.DataArray(range(24), [('time', modern_times)])
da
datetimeindex = da.indexes['time'].to_datetimeindex()
da['time'] = datetimeindex
da.resample(time='Y').mean('time')
However in this case one should use caution to only perform operations which
do not depend on differences between dates (e.g. differentiation,
interpolation, or upsampling with resample), as these could introduce subtle
and silent errors due to the difference in calendar types between the dates
encoded in your data and the dates stored in memory.
.. _Timestamp-valid range: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#timestamp-limitations
.. _ISO 8601-format: https://en.wikipedia.org/wiki/ISO_8601
.. _partial datetime string indexing: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#partial-string-indexing
|