File: sp_tree.md

package info (click to toggle)
mlpack 4.7.0-2
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 32,064 kB
  • sloc: cpp: 233,202; python: 1,940; sh: 1,201; lisp: 414; makefile: 85
file content (748 lines) | stat: -rw-r--r-- 30,605 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
# `SPTree`

The `SPTree` class implements the standard hybrid spill tree, a binary space
partitioning tree that allows overlapping volumes between nodes.  This type of
tree can be more effective than trees like the [`KDTree`](kdtree.md) for
[approximate nearest neighbor search](../../methods/knn.md) and related tasks.

`SPTree` supports three template parameters for configurable behavior, and
implements all the functionality required by the [TreeType
API](../../../developer/trees.md#the-treetype-api), plus some additional
functionality specific to spill trees.  `SPTree` is built on the more generic
[`SpillTree`](spill_tree.md) class, so if fully custom behavior is desired, that

 * [Template parameters](#template-parameters)
 * [Constructors](#constructors)
 * [Basic tree properties](#basic-tree-properties)
 * [Bounding distances with the tree](#bounding-distances-with-the-tree)
 * [Tree traversals](#tree-traversals)
 * [Example usage](#example-usage)

## See also

 * [`SpillTree`](spill_tree.md)
 * [`MeanSPTree`](mean_sp_tree.md)
 * [`NonOrtSPTree`](non_ort_sp_tree.md)
 * [`NonOrtMeanSPTree`](non_ort_mean_sp_tree.md)
 * [`BinarySpaceTree`](binary_space_tree.md)
 * [mlpack trees](../trees.md)
 * [`KNN`](../../methods/knn.md)
 * [mlpack geometric algorithms](../../modeling.md#geometric-algorithms)
 * [An Investigation of Practical Approximate Nearest Neighbor Algorithms (pdf)](https://proceedings.neurips.cc/paper/2004/file/1102a326d5f7c9e04fc3c89d0ede88c9-Paper.pdf)
 * [Tree-Independent Dual-Tree Algorithms (pdf)](https://www.ratml.org/pub/pdf/2013tree.pdf)

## Template parameters

In accordance with the
[TreeType API](../../../developer/trees.md#template-parameters-required-by-the-treetype-policy)
(see also
[this more detailed section](../../../developer/trees.md#template-parameters)),
the `SPTree` class takes three template parameters:

```
SPTree<DistanceType, StatisticType, MatType>
```

 * `DistanceType`: the [distance metric](../distances.md) to use for distance
   computations.  Because the `SPTree` internally uses
   [`HRectBound`](binary_space_tree.md#hrectbound), this is required to be
   [`EuclideanDistance`](../distances.md#lmetric).  See
   [`NonOrtSPTree`](non_ort_sp_tree.md) for a version of the spill tree where
   arbitrary distance metrics are allowed.

 * `StatisticType`: this holds auxiliary information in each tree node.  By
   default, [`EmptyStatistic`](binary_space_tree.md#emptystatistic) is used,
   which holds no information.
   - See the [`StatisticType`](binary_space_tree.md#statistictype) section in
     the `BinarySpaceTree` documentation for more details.

 * `MatType`: the type of matrix used to represent points.  Must be a type
   matching the [Armadillo API](../../matrices.md).  By default, `arma::mat` is
   used, but other types such as `arma::fmat` or similar will work just fine.

The `SPTree` class itself is a convenience typedef of the generic
[`SpillTree`](spill_tree.md) class, using the
[`AxisOrthogonalHyperplane`](spill_tree.md#axisorthogonalhyperplane) class as
the splitting hyperplane type, and the
[`MidpointSpaceSplit`](spill_tree.md#midpointspacesplit) class as the splitting
strategy.

If no template parameters are explicitly specified, then defaults are used:

```
SPTree<> = SPTree<EuclideanDistance, EmptyStatistic, arma::mat>
```

## Constructors

`SPTree`s are constructed by iteratively finding splitting hyperplanes, and
points within a margin of the hyperplane are assigned to *both* child nodes.
Unlike the constructors of
[`BinarySpaceTree`](binary_space_tree.md#constructors), the dataset is not
permuted during construction.

---

 * `node = SPTree(data, tau=0.0, maxLeafSize=20, rho=0.7)`
   - Construct an `SPTree` on the given `data`, using the specified
     hyperparameters to control tree construction behavior.
   - By default, a reference to `data` is stored.  If `data` goes out of scope
     after tree construction, memory errors will occur!  To avoid this, either
     pass the dataset or a copy with `std::move()` (e.g. `std::move(data)`);
     when doing this, `data` will be set to an empty matrix.

---

 * `node = SPTree<DistanceType, StatisticType, MatType>(data, tau=0.0, maxLeafSize=20, rho=0.7)`
   - Construct an `SPTree` on the given `data`, using custom template
     parameters, and using the specified hyperparameters to control tree
     construction behavior.
   - By default, a reference to `data` is stored.  If `data` goes out of scope
     after tree construction, memory errors will occur!  To avoid this, either
     pass the dataset or a copy with `std::move()` (e.g. `std::move(data)`);
     when doing this, `data` will be set to an empty matrix.

---

 * `node = SPTree()`
   - Construct an empty `SPTree` with no children, no points, and default
     template parameters.

---

***Notes:***

 - The name `node` is used here for `SPTree` objects instead of `tree`, because
   each `SPTree` object is a single node in the tree.  The constructor returns
   the node that is the root of the tree.

 - Inserting individual points or removing individual points from an `SPTree` is
   not supported, because this generally results in a tree with very suboptimal
   hyperplane splits.  It is better to simply build a new `SPTree` on the
   modified dataset.  For trees that support individual insertion and deletions,
   see the [`RectangleTree`](rectangle_tree.md) class and all its variants (e.g.
   [`RTree`](r_tree.md), [`RStarTree`](r_star_tree.md), etc.).

 - See also the
   [developer documentation on tree constructors](../../../developer/trees.md#constructors-and-destructors).

---

### Constructor parameters:

| **name** | **type** | **description** | **default** |
|----------|----------|-----------------|-------------|
| `data` | [`MatType`](../../matrices.md) | [Column-major](../../matrices.md#representing-data-in-mlpack) matrix to build the tree on. | _(N/A)_ |
| `tau` | `double` | Width of spill margin: points within `tau` of the splitting hyperplane of a node will be contained in both left and right children. | `0.0` |
| `maxLeafSize` | `size_t` | Maximum number of points to store in each leaf. | `20` |
| `rho` | `double` | Balance threshold. When splitting, if either overlapping node would contain a fraction of more than `rho` of the points, a non-overlapping split is performed. Must be in the range `[0.0, 1.0)`. | `0.7` |

***Caveats***:

 * `tau` must be manually tuned for the properties of each dataset; the default,
   `0.0`, will never allow overlap between nodes (and thus the created tree will
   essentially be a non-overlapping [`BinarySpaceTree`](binary_space_tree.md)).

 * If `tau` is set too large, nodes will overlap too much and search quality
   will be degraded.

 * `rho` implicitly controls the depth of the tree by forcing very overlapping
   children to be non-overlapping.  As `rho` gets closer to `1`, more overlap is
   allowed, which in turn makes the tree deeper.  If `rho` is set to `0.5` or
   less, then all splits will be non-overlapping (and the tree will essentially
   be a [`BinarySpaceTree`](binary_space_tree.md)).

## Basic tree properties

Once an `SPTree` object is constructed, various properties of the tree can be
accessed or inspected.  Many of these functions are required by the
[TreeType API](../../../developer/trees.md#the-treetype-api).

### Navigating the tree

 * `node.NumChildren()` returns the number of children in `node`.  This is
   either `2` if `node` has children, or `0` if `node` is a leaf.

 * `node.IsLeaf()` returns a `bool` indicating whether or not `node` is a leaf.

 * `node.Child(i)` returns an `SPTree&` that is the `i`th child.
   - `i` must be `0` or `1`.
   - This function should only be called if `node.NumChildren()` is not `0`
     (e.g. if `node` is not a leaf).  Note that this returns a valid
     `SPTree&` that can itself be used just like the root node of the
     tree!
   - `node.Left()` and `node.Right()` are convenience functions specific to
     `SPTree` that will return `SPTree*` (pointers) to the left and right
     children, respectively, or `NULL` if `node` has no children.

 * `node.Parent()` will return an `SPTree*` that points to the parent of
   `node`, or `NULL` if `node` is the root of the `SPTree`.

---

### Accessing members of a tree

 * `node.Overlap()` will return a `bool` that is `true` if `node`'s children are
   overlapping, and `false` otherwise.

 * `node.Hyperplane()` will return an
   [`AxisOrthogonalHyperplane`](spill_tree.md#axisorthogonalhyperplane) object
   that represents the axis-aligned splitting hyperplane of `node`.
   - All points in `node.Left()` are to the left of `node.Hyperplane()` if
     `node.Overlap()` is `false`; otherwise, all points in `node.Left()` are to
     the left of `node.Hyperplane() + tau`.
   - All points in `node.Right()` are to the right of `node.Hyperplane()` if
     `node.Overlap()` is `false`; otherwise, all points in `node.Right()` are to
     the right of `node.Hyperplane() - tau`.

 * `node.Bound()` will return a
   [`const HRectBound&`](binary_space_tree.md#hrectbound) representing the
   bounding box associated with `node`.
   - If a [custom `DistanceType` and/or `MatType`](#template-parameters) are
     specified, then a `const HRectBound<DistanceType, ElemType>&` is returned.
     * `ElemType` is the element type of the specified `MatType` (e.g. `double`
       for `arma::mat`, `float` for `arma::fmat`, etc.).

 * `node.Stat()` will return a `StatisticType&` holding the statistics of the
   node that were computed during tree construction.

 * `node.Distance()` will return a `EuclideanDistance&`.  Because
   `EuclideanDistance` has no instantiated members, this is unlikely to be
    useful, but is required to satisfy the
    [`TreeType` API](../../../developer/trees.md#the-treetype-api).

See also the
[developer documentation](../../../developer/trees.md#basic-tree-functionality)
for basic tree functionality in mlpack.

---

### Accessing data held in a tree

 * `node.Dataset()` will return a `const MatType&` that is the dataset the
   tree was built on.

 * `node.NumPoints()` returns a `size_t` indicating the number of points held
   directly in `node`.
   - If `node` is not a leaf, this will return `0`, as `SPTree` only holds
     points directly in its leaves.
   - If `node` is a leaf, then the number of points will be less than or equal
     to the `maxLeafSize` that was specified when the tree was constructed.

 * `node.Point(i)` returns a `size_t` indicating the index of the `i`'th point
   in `node.Dataset()`.
   - `i` must be in the range `[0, node.NumPoints() - 1]` (inclusive).
   - `node` must be a leaf (as non-leaves do not hold any points).
   - The `i`'th point in `node` can then be accessed as
     `node.Dataset().col(node.Point(i))`.
   - Accessing the actual `i`'th point itself can be done with, e.g.,
     `node.Dataset().col(node.Point(i))`.

 * `node.NumDescendants()` returns a `size_t` indicating the number of points
   held in all descendant leaves of `node`.
   - If `node` is the root of the tree, then `node.NumDescendants()` will be
     equal to `node.Dataset().n_cols`.

 * `node.Descendant(i)` returns a `size_t` indicating the index of the `i`'th
   descendant point in `node.Dataset()`.
   - `i` must be in the range `[0, node.NumDescendants() - 1]` (inclusive).
   - `node` does not need to be a leaf.
   - The `i`'th descendant point in `node` can then be accessed as
     `node.Dataset().col(node.Descendant(i))`.
   - Accessing the actual `i`'th descendant itself can be done with, e.g.,
     `node.Dataset().col(node.Descendant(i))`.

---

### Accessing computed bound quantities of a tree

The following quantities are cached for each node in an `SPTree`, and so
accessing them does not require any computation.  In the documentation below,
`ElemType` is the element type of the given `MatType`; e.g., if `MatType` is
`arma::mat`, then `ElemType` is `double`.

 * `node.FurthestPointDistance()` returns an `ElemType` representing the
   distance between the center of the bound of `node` and the furthest point
   held by `node`.
   - If `node` is not a leaf, this returns 0 (because `node` does not hold any
     points).

 * `node.FurthestDescendantDistance()` returns an `ElemType` representing the
   distance between the center of the bound of `node` and the furthest
   descendant point held by `node`.

 * `node.MinimumBoundDistance()` returns an `ElemType` representing the minimum
   possible distance from the center of the node to any edge of its bound.

 * `node.ParentDistance()` returns an `ElemType` representing the distance
   between the center of the bound of `node` and the center of the bound of its
   parent.
   - If `node` is the root of the tree, `0` is returned.

***Note:*** for more details on each bound quantity, see the [developer
documentation](../../../developer/trees.md#complex-tree-functionality-and-bounds)
on bound quantities for trees.

---

### Other functionality

 * `node.Center(center)` computes the center of the bound of `node` and stores
   it in `center`.
   - `center` should be of type `arma::Col<ElemType>&`, where `ElemType` is the
     element type of the specified `MatType`.
   - `center` will be set to have size equivalent to the dimensionality of the
     dataset held by `node`.
   - This is equivalent to calling `node.Bound().Center(center)`.

 * An `SPTree` can be serialized with
   [`Save()` and `Load()`](../../load_save.md#mlpack-models-and-objects).

## Bounding distances with the tree

The primary use of trees in mlpack is bounding distances to points or other tree
nodes.  The following functions can be used for these tasks.

 * `node.GetNearestChild(point)`
 * `node.GetFurthestChild(point)`
   - Return a `size_t` indicating the index of the child (`0` for left, `1` for
     right) that is closest to (or furthest from) `point`, with respect
     to the `MinDistance()` (or `MaxDistance()`) function.
   - If there is a tie, `0` (the left child) is returned.
   - If `node` is a leaf, `0` is returned.
   - `point` should be a column vector type of the same type as `MatType`.
     (e.g., if `MatType` is `arma::mat`, then `point` should be an `arma::vec`.)

 * `node.GetNearestChild(other)`
 * `node.GetFurthestChild(other)`
   - Return a `size_t` indicating the index of the child (`0` for left, `1` for
     right) that is closest to (or furthest from) the `SPTree` node `other`,
     with respect to the `MinDistance()` (or `MaxDistance()`) function.
   - If there is a tie, `0` (the left child) is returned.
   - If `node` is a leaf, `0` is returned.

---

 * `node.MinDistance(point)`
 * `node.MinDistance(other)`
   - Return a `double` indicating the minimum possible distance between `node`
     and `point`, or the `SPTree` node `other`.
   - This is equivalent to the minimum possible distance between any point
     contained in the bounding hyperrectangle of `node` and `point`, or between
     any point contained in the bounding hyperrectangle of `node` and any point
     contained in the bounding hyperrectangle of `other`.
   - `point` should be a column vector type of the same type as `MatType`.
     (e.g., if `MatType` is `arma::mat`, then `point` should be an `arma::vec`.)

 * `node.MaxDistance(point)`
 * `node.MaxDistance(other)`
   - Return a `double` indicating the maximum possible distance between `node`
     and `point`, or the `SPTree` node `other`.
   - This is equivalent to the maximum possible distance between any point
     contained in the bounding hyperrectangle of `node` and `point`, or between
     any point contained in the bounding hyperrectangle of `node` and any point
     contained in the bounding hyperrectangle of `other`.
   - `point` should be a column vector type of the same type as `MatType`.
     (e.g., if `MatType` is `arma::mat`, then `point` should be an `arma::vec`.)

 * `node.RangeDistance(point)`
 * `node.RangeDistance(other)`
   - Return a [`RangeType<ElemType>`](../math.md#range) whose lower bound is
     `node.MinDistance(point)` or `node.MinDistance(other)`, and whose upper
      bound is `node.MaxDistance(point)` or `node.MaxDistance(other)`.
   - `ElemType` is the element type of `MatType`.
   - `point` should be a column vector type of the same type as `MatType`.
     (e.g., if `MatType` is `arma::mat`, then `point` should be an `arma::vec`.)

## Tree traversals

Like every mlpack tree, the `SPTree` class provides a [single-tree and
dual-tree traversal](../../../developer/trees.md#traversals) that can be paired
with a [`RuleType` class](../../../developer/trees.md#rules) to implement a
single-tree or dual-tree algorithm.

 * `SPTree::SingleTreeTraverser`
   - Implements a depth-first single-tree traverser.

 * `SPTree::DualTreeTraverser`
   - Implements a dual-depth-first dual-tree traverser.

However, spill trees are primarily useful because the overlapping nodes allow
*defeatist* search to be effective.  Defeatist search is non-backtracking: the
tree is traversed to one leaf only.  For example, finding the approximate
nearest neighbor of a point `p` with defeatist search is done by recursing in
the tree, choosing the child with smallest minimum distance to `p`, and when a
leaf is encountered, choosing the closest point in the leaf to `p` as the
nearest neighbor.  This is the strategy used in the
[original spill tree paper (pdf)](https://proceedings.neurips.cc/paper/2004/file/1102a326d5f7c9e04fc3c89d0ede88c9-Paper.pdf).

Defeatist traversers, matching the API for a regular
[traversal](../../../developer/trees.md#traversals) are made available as the
following two classes:

 * `SPTree::DefeatistSingleTreeTraverser`
   - Implements a depth-first single-tree defeatist traverser with no
     backtracking.  Traversal will terminate after the first leaf is visited.

 * `SPTree::DefeatistDualTreeTraverser`
   - Implements a dual-depth-first dual-tree defeatist traversal with no
     backtracking.  For each query leaf node, traversal will terminate after the
     first reference leaf node is visited.

Any [`RuleType`](../../../developer/trees.md#rules) that is being used with a
defeatist traversal, in addition to the functions required by the `RuleType`
API, must implement the following functions:

```
// This is only required for single-tree defeatist traversals.
// It should return the index of the branch that should be chosen for the given
// query point and reference node.
template<typename VecType, typename TreeType>
size_t GetBestChild(const VecType& queryPoint, TreeType& referenceNode);

// This is only required for dual-tree defeatist traversals.
// It should return the index of the best child of the reference node that
// should be chosen for the given query node.
template<typename TreeType>
size_t GetBestChild(TreeType& queryNode, TreeType& referenceNode);

// Return the minimum number of base cases (point-to-point computations) that
// are required during the traversal.
size_t MinimumBaseCases();
```

## Example usage

Build an `SPTree` on the `cloud` dataset and print basic statistics about the
tree.

```c++
// See https://datasets.mlpack.org/cloud.csv.
arma::mat dataset;
mlpack::Load("cloud.csv", dataset, mlpack::Fatal);

// Build the spill tree with a tau (margin) of 0.2 and a leaf size of 10.
// (This means that nodes are split until they contain 10 or fewer points.)
//
// The std::move() means that `dataset` will be empty after this call, and no
// data will be copied during tree building.
//
// When C++20 is enabled, then the <> is not necessary and the following line
// will work:
// mlpack::SPTree tree(std::move(dataset), 0.2, 10);
mlpack::SPTree<> tree(std::move(dataset), 0.2, 10);

// Print the bounding box of the root node.
std::cout << "Bounding box of root node:" << std::endl;
for (size_t i = 0; i < tree.Bound().Dim(); ++i)
{
  std::cout << " - Dimension " << i << ": [" << tree.Bound()[i].Lo() << ", "
      << tree.Bound()[i].Hi() << "]." << std::endl;
}
std::cout << std::endl;

// Print the number of descendant points of the root, and of each of its
// children.
std::cout << "Descendant points of root:        "
    << tree.NumDescendants() << "." << std::endl;
std::cout << "Descendant points of left child:  "
    << tree.Left()->NumDescendants() << "." << std::endl;
std::cout << "Descendant points of right child: "
    << tree.Right()->NumDescendants() << "." << std::endl;
std::cout << std::endl;

// Compute the center of the SPTree.
arma::vec center;
tree.Center(center);
std::cout << "Center of tree: " << center.t();
```

---

Build two `SPTree`s on subsets of the corel dataset and compute minimum and
maximum distances between different nodes in the tree.

```c++
// See https://datasets.mlpack.org/corel-histogram.csv.
arma::mat dataset;
mlpack::Load("corel-histogram.csv", dataset, mlpack::Fatal);

// Build trees on the first half and the second half of points.  Use a tau
// (overlap) parameter of 0.3, which is tuned to this dataset, and a rho value
// of 0.6 to prevent the trees getting too deep.
mlpack::SPTree<> tree1(dataset.cols(0, dataset.n_cols / 2), 0.3, 20, 0.6);
mlpack::SPTree<> tree2(dataset.cols(dataset.n_cols / 2 + 1, dataset.n_cols - 1),
    0.3, 20, 0.6);

// Compute the maximum distance between the trees.
std::cout << "Maximum distance between tree root nodes: "
    << tree1.MaxDistance(tree2) << "." << std::endl;

// Get the leftmost grandchild of the first tree's root---if it exists.
if (!tree1.IsLeaf() && !tree1.Child(0).IsLeaf())
{
  mlpack::SPTree<>& node1 = tree1.Child(0).Child(0);

  // Get the rightmost grandchild of the second tree's root---if it exists.
  if (!tree2.IsLeaf() && !tree2.Child(1).IsLeaf())
  {
    mlpack::SPTree<>& node2 = tree2.Child(1).Child(1);

    // Print the minimum and maximum distance between the nodes.
    mlpack::Range dists = node1.RangeDistance(node2);
    std::cout << "Possible distances between two grandchild nodes: ["
        << dists.Lo() << ", " << dists.Hi() << "]." << std::endl;

    // Print the minimum distance between the first node and the first
    // descendant point of the second node.
    const size_t descendantIndex = node2.Descendant(0);
    const double descendantMinDist =
        node1.MinDistance(node2.Dataset().col(descendantIndex));
    std::cout << "Minimum distance between grandchild node and descendant "
        << "point: " << descendantMinDist << "." << std::endl;

    // Which child of node2 is closer to node1?
    const size_t closerIndex = node2.GetNearestChild(node1);
    if (closerIndex == 0)
      std::cout << "The left child of node2 is closer to node1." << std::endl;
    else if (closerIndex == 1)
      std::cout << "The right child of node2 is closer to node1." << std::endl;
    else // closerIndex == 2 in this case.
      std::cout << "Both children of node2 are equally close to node1."
          << std::endl;

    // And which child of node1 is further from node2?
    const size_t furtherIndex = node1.GetFurthestChild(node2);
    if (furtherIndex == 0)
      std::cout << "The left child of node1 is further from node2."
          << std::endl;
    else if (furtherIndex == 1)
      std::cout << "The right child of node1 is further from node2."
          << std::endl;
    else // furtherIndex == 2 in this case.
      std::cout << "Both children of node1 are equally far from node2."
          << std::endl;
  }
}
```

---

Build an `SPTree` on 32-bit floating point data and save it to disk.

```c++
// See https://datasets.mlpack.org/corel-histogram.csv.
arma::fmat dataset;
mlpack::Load("corel-histogram.csv", dataset);

// Build the SPTree using 32-bit floating point data as the matrix type.
// We will still use the default EmptyStatistic and EuclideanDistance
// parameters.
mlpack::SPTree<mlpack::EuclideanDistance,
               mlpack::EmptyStatistic,
               arma::fmat> tree(std::move(dataset), 0.1, 20, 0.95);

// Save the tree to disk with the name 'tree'.
mlpack::Save("tree.bin", "tree", tree);

std::cout << "Saved tree with " << tree.Dataset().n_cols << " points to "
    << "'tree.bin'." << std::endl;
```

---

Load a 32-bit floating point `SPTree` from disk, then traverse it manually and
find the number of nodes whose children overlap.

```c++
// This assumes the tree has already been saved to 'tree.bin' (as in the example
// above).

// This convenient typedef saves us a long type name!
using TreeType = mlpack::SPTree<mlpack::EuclideanDistance,
                                mlpack::EmptyStatistic,
                                arma::fmat>;

TreeType tree;
mlpack::Load("tree.bin", tree);
std::cout << "Tree loaded with " << tree.NumDescendants() << " points."
    << std::endl;

// Recurse in a depth-first manner.  Count both the total number of non-leaves,
// and the number of non-leaves that have overlapping children.
size_t overlapCount = 0;
size_t totalInternalNodeCount = 0;
std::stack<TreeType*> stack;
stack.push(&tree);
while (!stack.empty())
{
  TreeType* node = stack.top();
  stack.pop();

  if (node->IsLeaf())
    continue;

  if (node->Overlap())
    ++overlapCount;
  ++totalInternalNodeCount;

  stack.push(node->Left());
  stack.push(node->Right());
}

// Note that it would be possible to use TreeType::SingleTreeTraverser to
// perform the recursion above, but that is more well-suited for more complex
// tasks that require pruning and other non-trivial behavior; so using a simple
// stack is the better option here.

// Print the results.
std::cout << overlapCount << " out of " << totalInternalNodeCount
    << " internal nodes have overlapping children." << std::endl;
```

---

Use a defeatist traversal to find the approximate nearest neighbor of the third
and fourth points in the `corel-histogram` dataset.  (Note: this can also be
done more easily with the [`KNN`](../../methods/knn.md) class!  This example is
a demonstration of how to use the defeatist traverser.)

For this example, we must first define a
[`RuleType` class](../../../developer/trees.md#rules).

```c++
// For simplicity, this only implements those methods required by single-tree
// traversals, and cannot be used with a dual-tree traversal.
//
// `.Reset()` must be called before any additional single-tree traversals after
// the first is run.
class SpillNearestNeighborRule
{
 public:
  // Store the dataset internally.
  SpillNearestNeighborRule(const arma::mat& dataset) :
      dataset(dataset),
      nearestNeighbor(size_t(-1)),
      nearestDistance(DBL_MAX) { }

  // Compute the base case (point-to-point comparison).
  double BaseCase(const size_t queryIndex, const size_t referenceIndex)
  {
    // Skip the base case if the points are the same.
    if (queryIndex == referenceIndex)
      return 0.0;

    const double dist = mlpack::EuclideanDistance::Evaluate(
        dataset.col(queryIndex), dataset.col(referenceIndex));

    if (dist < nearestDistance)
    {
      nearestNeighbor = referenceIndex;
      nearestDistance = dist;
    }

    return dist;
  }

  // Score the given node in the tree; if it is sufficiently far away that it
  // cannot contain a better nearest neighbor candidate, we can prune it.
  template<typename TreeType>
  double Score(const size_t queryIndex, const TreeType& referenceNode) const
  {
    const double minDist = referenceNode.MinDistance(dataset.col(queryIndex));
    if (minDist > nearestDistance)
      return DBL_MAX; // Prune: this cannot contain a better candidate!

    return minDist;
  }

  // Rescore the given node/point combination.  Note that this will not be used
  // by the defeatist traversal as it never backtracks, but we include it for
  // completeness because the RuleType API requires it.
  template<typename TreeType>
  double Rescore(const size_t, const TreeType&, const double oldScore) const
  {
    if (oldScore > nearestDistance)
      return DBL_MAX; // Prune: the node is too far away.
    return oldScore;
  }

  // This is required by defeatist traversals to select the best reference
  // child to recurse into for overlapping nodes.
  template<typename TreeType>
  size_t GetBestChild(const size_t queryIndex, TreeType& referenceNode)
      const
  {
    return referenceNode.GetNearestChild(dataset.col(queryIndex));
  }

  // We must perform at least two base cases in order to have a result.  Note
  // that this is two, and not one, because we skip base cases where the query
  // and reference points are the same.  That can only happen a maximum of once,
  // so to ensure that we compare a query point to a different reference point
  // at least once, we must return 2 here.
  size_t MinimumBaseCases() const { return 2; }

  // Get the results (to be called after the traversal).
  size_t NearestNeighbor() const { return nearestNeighbor; }
  double NearestDistance() const { return nearestDistance; }

  // Reset the internal statistics for an additional traversal.
  void Reset()
  {
    nearestNeighbor = size_t(-1);
    nearestDistance = DBL_MAX;
  }

 private:
  const arma::mat& dataset;

  size_t nearestNeighbor;
  double nearestDistance;
};
```

```c++
// See https://datasets.mlpack.org/corel-histogram.csv.
arma::mat dataset;
mlpack::Load("corel-histogram.csv", dataset, mlpack::Fatal);

// Build two trees, one with a lot of overlap, and one with no overlap
// (e.g. tau = 0).
mlpack::SPTree<> tree1(dataset, 0.5, 10), tree2(dataset, 0.0, 10);

// Construct the rule types, and then the traversals.
SpillNearestNeighborRule r1(dataset), r2(dataset);

mlpack::SPTree<>::DefeatistSingleTreeTraverser<SpillNearestNeighborRule>
    t1(r1), t2(r2);

// Search for the approximate nearest neighbor of point 3 using both trees.
t1.Traverse(3, tree1);
t2.Traverse(3, tree2);

std::cout << "Approximate nearest neighbor of point 3:" << std::endl;
std::cout << " - Spill tree with overlap 0.5 found: point "
    << r1.NearestNeighbor() << ", distance " << r1.NearestDistance()
    << "." << std::endl;

std::cout << " - Spill tree with no overlap found: point "
    << r2.NearestNeighbor() << ", distance " << r2.NearestDistance()
    << "." << std::endl;

// Now search for point 6.
r1.Reset();
r2.Reset();

t1.Traverse(6, tree1);
t2.Traverse(6, tree2);

std::cout << "Approximate nearest neighbor of point 6:" << std::endl;
std::cout << " - Spill tree with overlap 0.5 found: point "
    << r1.NearestNeighbor() << ", distance " << r1.NearestDistance()
    << "." << std::endl;

std::cout << " - Spill tree with no overlap found: point "
    << r2.NearestNeighbor() << ", distance " << r2.NearestDistance()
    << "." << std::endl;
```