File: hierarchical-data.rst

package info (click to toggle)
python-xarray 2025.08.0-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 11,796 kB
  • sloc: python: 115,416; makefile: 258; sh: 47
file content (865 lines) | stat: -rw-r--r-- 31,779 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
.. _userguide.hierarchical-data:

Hierarchical data
=================

.. jupyter-execute::
    :hide-code:
    :hide-output:

    import numpy as np
    import pandas as pd
    import xarray as xr

    np.random.seed(123456)
    np.set_printoptions(threshold=10)

    %xmode minimal

.. _why:

Why Hierarchical Data?
----------------------

Many real-world datasets are composed of multiple differing components,
and it can often be useful to think of these in terms of a hierarchy of related groups of data.
Examples of data which one might want organise in a grouped or hierarchical manner include:

- Simulation data at multiple resolutions,
- Observational data about the same system but from multiple different types of sensors,
- Mixed experimental and theoretical data,
- A systematic study recording the same experiment but with different parameters,
- Heterogeneous data, such as demographic and metereological data,

or even any combination of the above.

Often datasets like this cannot easily fit into a single :py:class:`~xarray.Dataset` object,
or are more usefully thought of as groups of related :py:class:`~xarray.Dataset` objects.
For this purpose we provide the :py:class:`xarray.DataTree` class.

This page explains in detail how to understand and use the different features
of the :py:class:`~xarray.DataTree` class for your own hierarchical data needs.

.. _node relationships:

Node Relationships
------------------

.. _creating a family tree:

Creating a Family Tree
~~~~~~~~~~~~~~~~~~~~~~

The three main ways of creating a :py:class:`~xarray.DataTree` object are described briefly in :ref:`creating a datatree`.
Here we go into more detail about how to create a tree node-by-node, using a famous family tree from the Simpsons cartoon as an example.

Let's start by defining nodes representing the two siblings, Bart and Lisa Simpson:

.. jupyter-execute::

    bart = xr.DataTree(name="Bart")
    lisa = xr.DataTree(name="Lisa")

Each of these node objects knows their own :py:class:`~xarray.DataTree.name`, but they currently have no relationship to one another.
We can connect them by creating another node representing a common parent, Homer Simpson:

.. jupyter-execute::

    homer = xr.DataTree(name="Homer", children={"Bart": bart, "Lisa": lisa})

Here we set the children of Homer in the node's constructor.
We now have a small family tree where we can see how these individual Simpson family members are related to one another:

.. jupyter-execute::

    print(homer)

.. note::
   We use ``print()`` above to show the compact tree hierarchy.
   :py:class:`~xarray.DataTree` objects also have an interactive HTML representation that is enabled by default in editors such as JupyterLab and VSCode.
   The HTML representation is especially helpful for larger trees and exploring new datasets, as it allows you to expand and collapse nodes.
   If you prefer the text representations you can also set ``xr.set_options(display_style="text")``.

..
   Comment:: may remove note and print()s after upstream theme changes https://github.com/pydata/pydata-sphinx-theme/pull/2187

The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the :py:class:`~xarray.DataTree.siblings` property:

.. jupyter-execute::

    list(homer["Bart"].siblings)

But oops, we forgot Homer's third daughter, Maggie! Let's add her by updating Homer's :py:class:`~xarray.DataTree.children` property to include her:

.. jupyter-execute::

    maggie = xr.DataTree(name="Maggie")
    homer.children = {"Bart": bart, "Lisa": lisa, "Maggie": maggie}
    print(homer)

Let's check that Maggie knows who her Dad is:

.. jupyter-execute::

    maggie.parent.name

That's good - updating the properties of our nodes does not break the internal consistency of our tree, as changes of parentage are automatically reflected on both nodes.

    These children obviously have another parent, Marge Simpson, but :py:class:`~xarray.DataTree` nodes can only have a maximum of one parent.
    Genealogical `family trees are not even technically trees <https://en.wikipedia.org/wiki/Family_tree#Graph_theory>`_ in the mathematical sense -
    the fact that distant relatives can mate makes them directed acyclic graphs.
    Trees of :py:class:`~xarray.DataTree` objects cannot represent this.

Homer is currently listed as having no parent (the so-called "root node" of this tree), but we can update his :py:class:`~xarray.DataTree.parent` property:

.. jupyter-execute::

    abe = xr.DataTree(name="Abe")
    abe.children = {"Homer": homer}

Abe is now the "root" of this tree, which we can see by examining the :py:class:`~xarray.DataTree.root` property of any node in the tree

.. jupyter-execute::

    maggie.root.name

We can see the whole tree by printing Abe's node or just part of the tree by printing Homer's node:

.. jupyter-execute::

    print(abe)

.. jupyter-execute::

    print(abe["Homer"])

In episode 28, Abe Simpson reveals that he had another son, Herbert "Herb" Simpson.
We can add Herbert to the family tree without displacing Homer by :py:meth:`~xarray.DataTree.assign`-ing another child to Abe:

.. jupyter-execute::

    herbert = xr.DataTree(name="Herb")
    abe = abe.assign({"Herbert": herbert})
    print(abe)

.. jupyter-execute::

    print(abe["Herbert"].name)
    print(herbert.name)

.. note::
   This example shows a subtlety - the returned tree has Homer's brother listed as ``"Herbert"``,
   but the original node was named "Herb". Not only are names overridden when stored as keys like this,
   but the new node is a copy, so that the original node that was referenced is unchanged (i.e. ``herbert.name == "Herb"`` still).
   In other words, nodes are copied into trees, not inserted into them.
   This is intentional, and mirrors the behaviour when storing named :py:class:`~xarray.DataArray` objects inside datasets.

Certain manipulations of our tree are forbidden, if they would create an inconsistent result.
In episode 51 of the show Futurama, Philip J. Fry travels back in time and accidentally becomes his own Grandfather.
If we try similar time-travelling hijinks with Homer, we get a :py:class:`~xarray.InvalidTreeError` raised:

.. jupyter-execute::
    :raises:

    abe["Homer"].children = {"Abe": abe}

.. _evolutionary tree:

Ancestry in an Evolutionary Tree
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's use a different example of a tree to discuss more complex relationships between nodes - the phylogenetic tree, or tree of life.

.. jupyter-execute::

    vertebrates = xr.DataTree.from_dict(
        {
            "/Sharks": None,
            "/Bony Skeleton/Ray-finned Fish": None,
            "/Bony Skeleton/Four Limbs/Amphibians": None,
            "/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates": None,
            "/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Rodents & Rabbits": None,
            "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs": None,
            "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Birds": None,
        },
        name="Vertebrae",
    )

    primates = vertebrates["/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates"]

    dinosaurs = vertebrates[
        "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs"
    ]

We have used the :py:meth:`~xarray.DataTree.from_dict` constructor method as a preferred way to quickly create a whole tree,
and :ref:`filesystem paths` (to be explained shortly) to select two nodes of interest.

.. jupyter-execute::

    print(vertebrates)

This tree shows various families of species, grouped by their common features (making it technically a `"Cladogram" <https://en.wikipedia.org/wiki/Cladogram>`_,
rather than an evolutionary tree).

Here both the species and the features used to group them are represented by :py:class:`~xarray.DataTree` node objects - there is no distinction in types of node.
We can however get a list of only the nodes we used to represent species by using the fact that all those nodes have no children - they are "leaf nodes".
We can check if a node is a leaf with :py:meth:`~xarray.DataTree.is_leaf`, and get a list of all leaves with the :py:class:`~xarray.DataTree.leaves` property:

.. jupyter-execute::

    print(primates.is_leaf)
    [node.name for node in vertebrates.leaves]

Pretending that this is a true evolutionary tree for a moment, we can find the features of the evolutionary ancestors (so-called "ancestor" nodes),
the distinguishing feature of the common ancestor of all vertebrate life (the root node),
and even the distinguishing feature of the common ancestor of any two species (the common ancestor of two nodes):

.. jupyter-execute::

    print([node.name for node in reversed(primates.parents)])
    print(primates.root.name)
    print(primates.find_common_ancestor(dinosaurs).name)

We can only find a common ancestor between two nodes that lie in the same tree.
If we try to find the common evolutionary ancestor between primates and an Alien species that has no relationship to Earth's evolutionary tree,
an error will be raised.

.. jupyter-execute::
    :raises:

    alien = xr.DataTree(name="Xenomorph")
    primates.find_common_ancestor(alien)


.. _navigating trees:

Navigating Trees
----------------

There are various ways to access the different nodes in a tree.

Properties
~~~~~~~~~~

We can navigate trees using the :py:class:`~xarray.DataTree.parent` and :py:class:`~xarray.DataTree.children` properties of each node, for example:

.. jupyter-execute::

    lisa.parent.children["Bart"].name

but there are also more convenient ways to access nodes.

Dictionary-like interface
~~~~~~~~~~~~~~~~~~~~~~~~~

Children are stored on each node as a key-value mapping from name to child node.
They can be accessed and altered via the :py:class:`~xarray.DataTree.__getitem__` and :py:class:`~xarray.DataTree.__setitem__` syntax.
In general :py:class:`~xarray.DataTree.DataTree` objects support almost the entire set of dict-like methods,
including :py:meth:`~xarray.DataTree.keys`, :py:class:`~xarray.DataTree.values`, :py:class:`~xarray.DataTree.items`,
:py:meth:`~xarray.DataTree.__delitem__` and :py:meth:`~xarray.DataTree.update`.

.. jupyter-execute::

    print(vertebrates["Bony Skeleton"]["Ray-finned Fish"])

Note that the dict-like interface combines access to child :py:class:`~xarray.DataTree` nodes and stored :py:class:`~xarray.DataArrays`,
so if we have a node that contains both children and data, calling :py:meth:`~xarray.DataTree.keys` will list both names of child nodes and
names of data variables:

.. jupyter-execute::

    dt = xr.DataTree(
        dataset=xr.Dataset({"foo": 0, "bar": 1}),
        children={"a": xr.DataTree(), "b": xr.DataTree()},
    )
    print(dt)
    list(dt.keys())

This also means that the names of variables and of child nodes must be different to one another.

Attribute-like access
~~~~~~~~~~~~~~~~~~~~~

You can also select both variables and child nodes through dot indexing

.. jupyter-execute::

    print(dt.foo)
    print(dt.a)

.. _filesystem paths:

Filesystem-like Paths
~~~~~~~~~~~~~~~~~~~~~

Hierarchical trees can be thought of as analogous to file systems.
Each node is like a directory, and each directory can contain both more sub-directories and data.

.. note::

    Future development will allow you to make the filesystem analogy concrete by
    using :py:func:`~xarray.DataTree.open_mfdatatree` or
    :py:func:`~xarray.DataTree.save_mfdatatree`.
    (`See related issue in GitHub <https://github.com/xarray-contrib/datatree/issues/55>`_)

Datatree objects support a syntax inspired by unix-like filesystems,
where the "path" to a node is specified by the keys of each intermediate node in sequence,
separated by forward slashes.
This is an extension of the conventional dictionary ``__getitem__`` syntax to allow navigation across multiple levels of the tree.

Like with filepaths, paths within the tree can either be relative to the current node, e.g.

.. jupyter-execute::

    print(abe["Homer/Bart"].name)
    print(abe["./Homer/Bart"].name)  # alternative syntax

or relative to the root node.
A path specified from the root (as opposed to being specified relative to an arbitrary node in the tree) is sometimes also referred to as a
`"fully qualified name" <https://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf-zarr-data-model-specification#nczarr_fqn>`_,
or as an "absolute path".
The root node is referred to by ``"/"``, so the path from the root node to its grand-child would be ``"/child/grandchild"``, e.g.

.. jupyter-execute::

    # access lisa's sibling by a relative path.
    print(lisa["../Bart"])
    # or from absolute path
    print(lisa["/Homer/Bart"])


Relative paths between nodes also support the ``"../"`` syntax to mean the parent of the current node.
We can use this with ``__setitem__`` to add a missing entry to our evolutionary tree, but add it relative to a more familiar node of interest:

.. jupyter-execute::

    primates["../../Two Fenestrae/Crocodiles"] = xr.DataTree()
    print(vertebrates)

Given two nodes in a tree, we can also find their relative path:

.. jupyter-execute::

    bart.relative_to(lisa)

You can use this filepath feature to build a nested tree from a dictionary of filesystem-like paths and corresponding :py:class:`~xarray.Dataset` objects in a single step.
If we have a dictionary where each key is a valid path, and each value is either valid data or ``None``,
we can construct a complex tree quickly using the alternative constructor :py:meth:`~xarray.DataTree.from_dict()`:

.. jupyter-execute::

    d = {
        "/": xr.Dataset({"foo": "orange"}),
        "/a": xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}),
        "/a/b": xr.Dataset({"zed": np.nan}),
        "a/c/d": None,
    }
    dt = xr.DataTree.from_dict(d)
    print(dt)

.. note::

    Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path
    (i.e. the node labelled ``"/a/c"`` in this case.)
    This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`xarray.DataTree.from_dict`.

.. _iterating over trees:

Iterating over trees
~~~~~~~~~~~~~~~~~~~~

You can iterate over every node in a tree using the subtree :py:class:`~xarray.DataTree.subtree` property.
This returns an iterable of nodes, which yields them in depth-first order.

.. jupyter-execute::

    for node in vertebrates.subtree:
        print(node.path)

Similarly, :py:class:`~xarray.DataTree.subtree_with_keys` returns an iterable of
relative paths and corresponding nodes.

A very useful pattern is to iterate over :py:class:`~xarray.DataTree.subtree_with_keys`
to manipulate nodes however you wish, then rebuild a new tree using
:py:meth:`xarray.DataTree.from_dict()`.
For example, we could keep only the nodes containing data by looping over all nodes,
checking if they contain any data using :py:class:`~xarray.DataTree.has_data`,
then rebuilding a new tree using only the paths of those nodes:

.. jupyter-execute::

    non_empty_nodes = {
        path: node.dataset for path, node in dt.subtree_with_keys if node.has_data
    }
    print(xr.DataTree.from_dict(non_empty_nodes))

You can see this tree is similar to the ``dt`` object above, except that it is missing the empty nodes ``a/c`` and ``a/c/d``.

(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`~xarray.DataTree.from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.name)``.)

.. _manipulating trees:

Manipulating Trees
------------------

Subsetting Tree Nodes
~~~~~~~~~~~~~~~~~~~~~

We can subset our tree to select only nodes of interest in various ways.

Similarly to on a real filesystem, matching nodes by common patterns in their paths is often useful.
We can use :py:meth:`xarray.DataTree.match` for this:

.. jupyter-execute::

    dt = xr.DataTree.from_dict(
        {
            "/a/A": None,
            "/a/B": None,
            "/b/A": None,
            "/b/B": None,
        }
    )
    result = dt.match("*/B")
    print(result)

We can also subset trees by the contents of the nodes.
:py:meth:`xarray.DataTree.filter` retains only the nodes of a tree that meet a certain condition.
For example, we could recreate the Simpson's family tree with the ages of each individual, then filter for only the adults:
First lets recreate the tree but with an ``age`` data variable in every node:

.. jupyter-execute::

    simpsons = xr.DataTree.from_dict(
        {
            "/": xr.Dataset({"age": 83}),
            "/Herbert": xr.Dataset({"age": 40}),
            "/Homer": xr.Dataset({"age": 39}),
            "/Homer/Bart": xr.Dataset({"age": 10}),
            "/Homer/Lisa": xr.Dataset({"age": 8}),
            "/Homer/Maggie": xr.Dataset({"age": 1}),
        },
        name="Abe",
    )
    print(simpsons)

Now let's filter out the minors:

.. jupyter-execute::

    print(simpsons.filter(lambda node: node["age"] > 18))

The result is a new tree, containing only the nodes matching the condition.

(Yes, under the hood :py:meth:`~xarray.DataTree.filter` is just syntactic sugar for the pattern we showed you in :ref:`iterating over trees` !)

If you want to filter out empty nodes you can use :py:meth:`~xarray.DataTree.prune`.

.. _Tree Contents:

Tree Contents
-------------

Hollow Trees
~~~~~~~~~~~~

A concept that can sometimes be useful is that of a "Hollow Tree", which means a tree with data stored only at the leaf nodes.
This is useful because certain useful tree manipulation operations only make sense for hollow trees.

You can check if a tree is a hollow tree by using the :py:class:`~xarray.DataTree.is_hollow` property.
We can see that the Simpson's family is not hollow because the data variable ``"age"`` is present at some nodes which
have children (i.e. Abe and Homer).

.. jupyter-execute::

    simpsons.is_hollow

.. _tree computation:

Computation
-----------

:py:class:`~xarray.DataTree` objects are also useful for performing computations, not just for organizing data.

Operations and Methods on Trees
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To show how applying operations across a whole tree at once can be useful,
let's first create a example scientific dataset.

.. jupyter-execute::

    def time_stamps(n_samples, T):
        """Create an array of evenly-spaced time stamps"""
        return xr.DataArray(
            data=np.linspace(0, 2 * np.pi * T, n_samples), dims=["time"]
        )


    def signal_generator(t, f, A, phase):
        """Generate an example electrical-like waveform"""
        return A * np.sin(f * t.data + phase)


    time_stamps1 = time_stamps(n_samples=15, T=1.5)
    time_stamps2 = time_stamps(n_samples=10, T=1.0)

    voltages = xr.DataTree.from_dict(
        {
            "/oscilloscope1": xr.Dataset(
                {
                    "potential": (
                        "time",
                        signal_generator(time_stamps1, f=2, A=1.2, phase=0.5),
                    ),
                    "current": (
                        "time",
                        signal_generator(time_stamps1, f=2, A=1.2, phase=1),
                    ),
                },
                coords={"time": time_stamps1},
            ),
            "/oscilloscope2": xr.Dataset(
                {
                    "potential": (
                        "time",
                        signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.2),
                    ),
                    "current": (
                        "time",
                        signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7),
                    ),
                },
                coords={"time": time_stamps2},
            ),
        }
    )
    print(voltages)

Most xarray computation methods also exist as methods on datatree objects,
so you can for example take the mean value of these two timeseries at once:

.. jupyter-execute::

    print(voltages.mean(dim="time"))

This works by mapping the standard :py:meth:`xarray.Dataset.mean()` method over the dataset stored in each node of the
tree one-by-one.

The arguments passed to the method are used for every node, so the values of the arguments you pass might be valid for one node and invalid for another

.. jupyter-execute::
    :raises:

    voltages.isel(time=12)

Notice that the error raised helpfully indicates which node of the tree the operation failed on.

Arithmetic Methods on Trees
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Arithmetic methods are also implemented, so you can e.g. add a scalar to every dataset in the tree at once.
For example, we can advance the timeline of the Simpsons by a decade just by

.. jupyter-execute::

    print(simpsons + 10)

See that the same change (fast-forwarding by adding 10 years to the age of each character) has been applied to every node.

Mapping Custom Functions Over Trees
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can map custom computation over each node in a tree using :py:meth:`xarray.DataTree.map_over_datasets`.
You can map any function, so long as it takes :py:class:`xarray.Dataset` objects as one (or more) of the input arguments,
and returns one (or more) xarray datasets.

.. note::

    Functions passed to :py:func:`~xarray.DataTree.map_over_datasets` cannot alter nodes in-place.
    Instead they must return new :py:class:`xarray.Dataset` objects.

For example, we can define a function to calculate the Root Mean Square of a timeseries

.. jupyter-execute::

    def rms(signal):
        return np.sqrt(np.mean(signal**2))

Then calculate the RMS value of these signals:

.. jupyter-execute::

    print(voltages.map_over_datasets(rms))

.. _multiple trees:

We can also use :py:func:`~xarray.map_over_datasets` to apply a function over
the data in multiple trees, by passing the trees as positional arguments.

Operating on Multiple Trees
---------------------------

The examples so far have involved mapping functions or methods over the nodes of a single tree,
but we can generalize this to mapping functions over multiple trees at once.

Iterating Over Multiple Trees
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To iterate over the corresponding nodes in multiple trees, use
:py:func:`~xarray.group_subtrees` instead of
:py:class:`~xarray.DataTree.subtree_with_keys`. This combines well with
:py:meth:`xarray.DataTree.from_dict()` to build a new tree:

.. jupyter-execute::

    dt1 = xr.DataTree.from_dict({"a": xr.Dataset({"x": 1}), "b": xr.Dataset({"x": 2})})
    dt2 = xr.DataTree.from_dict(
        {"a": xr.Dataset({"x": 10}), "b": xr.Dataset({"x": 20})}
    )
    result = {}
    for path, (node1, node2) in xr.group_subtrees(dt1, dt2):
        result[path] = node1.dataset + node2.dataset
    dt3 = xr.DataTree.from_dict(result)
    print(dt3)

Alternatively, you apply a function directly to paired datasets at every node
using :py:func:`xarray.map_over_datasets`:

.. jupyter-execute::

    dt3 = xr.map_over_datasets(lambda x, y: x + y, dt1, dt2)
    print(dt3)

Comparing Trees for Isomorphism
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For it to make sense to map a single non-unary function over the nodes of multiple trees at once,
each tree needs to have the same structure. Specifically two trees can only be considered similar,
or "isomorphic", if the full paths to all of their descendent nodes are the same.

Applying :py:func:`~xarray.group_subtrees` to trees with different structures
raises :py:class:`~xarray.TreeIsomorphismError`:

.. jupyter-execute::
    :raises:

    tree = xr.DataTree.from_dict({"a": None, "a/b": None, "a/c": None})
    simple_tree = xr.DataTree.from_dict({"a": None})
    for _ in xr.group_subtrees(tree, simple_tree):
        ...

We can explicitly also check if any two trees are isomorphic using the :py:meth:`~xarray.DataTree.isomorphic` method:

.. jupyter-execute::

    tree.isomorphic(simple_tree)

Corresponding tree nodes do not need to have the same data in order to be considered isomorphic:

.. jupyter-execute::

    tree_with_data = xr.DataTree.from_dict({"a": xr.Dataset({"foo": 1})})
    simple_tree.isomorphic(tree_with_data)

They also do not need to define child nodes in the same order:

.. jupyter-execute::

    reordered_tree = xr.DataTree.from_dict({"a": None, "a/c": None, "a/b": None})
    tree.isomorphic(reordered_tree)

Arithmetic Between Multiple Trees
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Arithmetic operations like multiplication are binary operations, so as long as we have two isomorphic trees,
we can do arithmetic between them.

.. jupyter-execute::

    currents = xr.DataTree.from_dict(
        {
            "/oscilloscope1": xr.Dataset(
                {
                    "current": (
                        "time",
                        signal_generator(time_stamps1, f=2, A=1.2, phase=1),
                    ),
                },
                coords={"time": time_stamps1},
            ),
            "/oscilloscope2": xr.Dataset(
                {
                    "current": (
                        "time",
                        signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7),
                    ),
                },
                coords={"time": time_stamps2},
            ),
        }
    )
    print(currents)

.. jupyter-execute::

    currents.isomorphic(voltages)

We could use this feature to quickly calculate the electrical power in our signal, P=IV.

.. jupyter-execute::

    power = currents * voltages
    print(power)

.. _hierarchical-data.alignment-and-coordinate-inheritance:

Alignment and Coordinate Inheritance
------------------------------------

.. _data-alignment:

Data Alignment
~~~~~~~~~~~~~~

The data in different datatree nodes are not totally independent. In particular dimensions (and indexes) in child nodes must be exactly aligned with those in their parent nodes.
Exact alignment means that shared dimensions must be the same length, and indexes along those dimensions must be equal.

.. note::
    If you were a previous user of the prototype `xarray-contrib/datatree <https://github.com/xarray-contrib/datatree>`_ package, this is different from what you're used to!
    In that package the data model was that the data stored in each node actually was completely unrelated. The data model is now slightly stricter.
    This allows us to provide features like :ref:`coordinate-inheritance`.

To demonstrate, let's first generate some example datasets which are not aligned with one another:

.. jupyter-execute::

    # (drop the attributes just to make the printed representation shorter)
    ds = xr.tutorial.open_dataset("air_temperature").drop_attrs()

    ds_daily = ds.resample(time="D").mean("time")
    ds_weekly = ds.resample(time="W").mean("time")
    ds_monthly = ds.resample(time="ME").mean("time")

These datasets have different lengths along the ``time`` dimension, and are therefore not aligned along that dimension.

.. jupyter-execute::

    print(ds_daily.sizes)
    print(ds_weekly.sizes)
    print(ds_monthly.sizes)

We cannot store these non-alignable variables on a single :py:class:`~xarray.Dataset` object, because they do not exactly align:

.. jupyter-execute::
    :raises:

    xr.align(ds_daily, ds_weekly, ds_monthly, join="exact")

But we :ref:`previously said <why>` that multi-resolution data is a good use case for :py:class:`~xarray.DataTree`, so surely we should be able to store these in a single :py:class:`~xarray.DataTree`?
If we first try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error:

.. jupyter-execute::
    :raises:

    xr.DataTree.from_dict({"daily": ds_daily, "daily/weekly": ds_weekly})

This is because DataTree checks that data in child nodes align exactly with their parents.

.. note::
    This requirement of aligned dimensions is similar to netCDF's concept of `inherited dimensions <https://www.unidata.ucar.edu/software/netcdf/workshops/2007/groups-types/Introduction.html>`_, as in netCDF-4 files dimensions are `visible to all child groups <https://docs.unidata.ucar.edu/netcdf-c/current/groups.html>`_.

This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`~xarray.align` command succeeds:

.. code:: python

    xr.align(child.dataset, *(parent.dataset for parent in child.parents), join="exact")

To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not direct descendents of one another, e.g. organize them as siblings.

.. jupyter-execute::

    dt = xr.DataTree.from_dict(
        {"daily": ds_daily, "weekly": ds_weekly, "monthly": ds_monthly}
    )
    print(dt)

Now we have a valid :py:class:`~xarray.DataTree` structure which contains all the data at each different time frequency, stored in a separate group.

This is a useful way to organise our data because we can still operate on all the groups at once.
For example we can extract all three timeseries at a specific lat-lon location:

.. jupyter-execute::

    dt_sel = dt.sel(lat=75, lon=300)
    print(dt_sel)

or compute the standard deviation of each timeseries to find out how it varies with sampling frequency:

.. jupyter-execute::

    dt_std = dt.std(dim="time")
    print(dt_std)

.. _coordinate-inheritance:

Coordinate Inheritance
~~~~~~~~~~~~~~~~~~~~~~

Notice that in the trees we constructed above there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical across the groups.

.. jupyter-execute::

    dt

We can use "Coordinate Inheritance" to define them only once in a parent group and remove this redundancy, whilst still being able to access those coordinate variables from the child groups.

.. note::
    This is also a new feature relative to the prototype `xarray-contrib/datatree <https://github.com/xarray-contrib/datatree>`_ package.

Let's instead place only the time-dependent variables in the child groups, and put the non-time-dependent ``lat`` and ``lon`` variables in the parent (root) group:

.. jupyter-execute::

    dt = xr.DataTree.from_dict(
        {
            "/": ds.drop_dims("time"),
            "daily": ds_daily.drop_vars(["lat", "lon"]),
            "weekly": ds_weekly.drop_vars(["lat", "lon"]),
            "monthly": ds_monthly.drop_vars(["lat", "lon"]),
        }
    )
    dt

This is preferred to the previous representation because it now makes it clear that all of these datasets share common spatial grid coordinates.
Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations.

We can still access the coordinates defined in the parent groups from any of the child groups as if they were actually present on the child groups:

.. jupyter-execute::

    dt.daily.coords

.. jupyter-execute::

    dt["daily/lat"]

As we can still access them, we say that the ``lat`` and ``lon`` coordinates in the child groups have been "inherited" from their common parent group.

If we print just one of the child nodes, it will still display inherited coordinates, but explicitly mark them as such:

.. jupyter-execute::

    dt["/daily"]

This helps to differentiate which variables are defined on the datatree node that you are currently looking at, and which were defined somewhere above it.

We can also still perform all the same operations on the whole tree:

.. jupyter-execute::

    dt.sel(lat=[75], lon=[300])

.. jupyter-execute::

    dt.std(dim="time")