1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166
|
# `RectangleTree`
The `RectangleTree` class represents a generic multidimensional space
partitioning tree. It is heavily templatized to control splitting behavior and
other behaviors, and is the actual class underlying trees such as the
[`RTree`](r_tree.md). In general, the `RectangleTree` class is not meant to
be used directly, and instead one of the numerous variants should be used
instead:
* [`RTree`](r_tree.md)
* [`RStarTree`](r_star_tree.md)
* [`XTree`](x_tree.md)
* [`RPlusTree`](r_plus_tree.md)
* [`RPlusPlusTree`](r_plus_plus_tree.md)
* [`HilbertRTree`](hilbert_r_tree.md)
The `RectangleTree` and its variants are capable of inserting points and
deleting them. This is different from [`BinarySpaceTree`](binary_space_tree.md)
and other mlpack tree types, where the tree is built entirely in batch at
construction time. However, this capability comes with a runtime cost, and so
in general the use of `RectangleTree` with mlpack algorithms will be slower than
the batch-construction trees---but, if insert/delete functionality is required,
`RectangleTree` is the only choice.
---
For users who want to use `RectangleTree` directly or with custom behavior,
the full class is still detailed in the subsections below. `RectangleTree`
supports the [TreeType API](../../../developer/trees.md#the-treetype-api) and
can be used with mlpack's tree-based algorithms, although using custom behavior
may require a template typedef.
* [Template parameters](#template-parameters)
* [Constructors](#constructors)
* [Basic tree properties](#basic-tree-properties)
* [Bounding distances with the tree](#bounding-distances-with-the-tree)
* [`StatisticType`](#statistictype) template parameter
* [`SplitType`](#splittype) template parameter
* [`DescentType`](#descenttype) template parameter
* [`AuxiliaryInformationType`](#auxiliaryinformationtype) template parameter
* [Tree traversals](#tree-traversals)
* [Example usage](#example-usage)
## See also
<!-- TODO: add links to all distance-based algorithms and other trees? -->
* [`RTree`](r_tree.md)
* [R-Tree on Wikipedia](https://en.wikipedia.org/wiki/R-tree)
* [R-Trees: A Dynamic Index Structure for Spatial Searching (pdf)](http://www-db.deis.unibo.it/courses/SI-LS/papers/Gut84.pdf)
* [Tree-Independent Dual-Tree Algorithms (pdf)](https://www.ratml.org/pub/pdf/2013tree.pdf)
## Template parameters
The `RectangleTree` class takes six template parameters. The first three of
these are required by the
[TreeType API](../../../developer/trees.md#template-parameters-required-by-the-treetype-policy)
(see also
[this more detailed section](../../../developer/trees.md#template-parameters)). The
full signature of the class is:
```
template<typename DistanceType,
typename StatisticType,
typename MatType,
typename SplitType,
typename DescentType,
template<typename> class AuxiliaryInformationType>
class RectangleTree;
```
* `DistanceType`: the [distance metric](../distances.md) to use for distance
computations. `RectangleTree` requires that this is
[`EuclideanDistance`](../distances.md#lmetric), and a compilation error will
be thrown if any other `DistanceType` is specified.
* `StatisticType`: this holds auxiliary information in each tree node. By
default, [`EmptyStatistic`](#emptystatistic) is used, which holds no
information.
- See the [`StatisticType`](#statistictype) section for more details.
* `MatType`: the type of matrix used to represent points. Must be a type
matching the [Armadillo API](../../matrices.md). By default, `arma::mat` is
used, but other types such as `arma::fmat` or similar will work just fine.
* `SplitType`: the class defining how an individual `RectangleTree` node
should be split. By default, [`RTreeSplit`](#rtreesplit) is used.
- See the [`SplitType`](#splittype) section for more details.
* `DescentType`: the class defining how a child node is chosen for point
insertion. By default, [`RTreeDescentHeuristic`](#rtreedescentheuristic) is
used.
- See the [`DescentType`](#descenttype) section for more details.
* `AuxiliaryInformationType`: holds information specific to the variant of the
`RectangleTree`. By default, `NoAuxiliaryInformation` is used.
Note that the TreeType API requires trees to have only three template
parameters. In order to use a `RectangleTree` with its six template parameters
with an mlpack algorithm that needs a TreeType, it is easiest to define a
template typedef:
```
template<typename DistanceType, typename StatisticType, typename MatType>
using CustomTree = Rectangle<DistanceType, StatisticType, MatType,
CustomSplitType, CustomDescentType, CustomAuxiliaryInformationType>
```
Here, `CustomSplitType`, `CustomDescentType`, and
`CustomAuxiliaryInformationType` are the desired splitting and descent
strategies and auxiliary information type. This is the way that all
`RectangleTree` variants (such as [`RTree`](r_tree.md)) are defined.
## Constructors
`RectangleTree`s are constructed by inserting points in a dataset sequentially.
The dataset is not permuted during the construction process.
---
* `node = RectangleTree(data)`
* `node = RectangleTree(data, maxLeafSize=20, minLeafSize=8)`
* `node = RectangleTree(data, maxLeafSize=20, minLeafSize=8, maxNumChildren=5, minNumChildren=2)`
- Construct a `RectangleTree` on the given `data` with the given construction
parameters.
- Default template parameters are used, meaning that this tree will be a
[`RTree`](r_tree.md).
- By default, `data` is copied. Avoid a copy by using `std::move()` (e.g.
`std::move(data)`); when doing this, `data` will be set to an empty matrix.
---
* `node = RectangleTree<DistanceType, StatisticType, MatType, SplitType, DescentType, AuxiliaryInformationType>(data)`
* `node = RectangleTree<DistanceType, StatisticType, MatType, SplitType, DescentType, AuxiliaryInformationType>(data, maxLeafSize=20, minLeafSize=8)`
* `node = RectangleTree<DistanceType, StatisticType, MatType, SplitType, DescentType, AuxiliaryInformationType>(data, maxLeafSize=20, minLeafSize=8, maxNumChildren=5, minNumChildren=2)`
- Construct a `RectangleTree` on the given `data`, using custom template
parameters to control the behavior of the tree and the given construction
parameters.
- By default, `data` is copied. Avoid a copy by using `std::move()` (e.g.
`std::move(data)`); when doing this, `data` will be set to an empty matrix.
---
* `node = RectangleTree(dimensionality)`
- Construct an empty `RectangleTree` with no children, no points, and default
template parameters.
- Use `node.Insert()` to insert points into the tree. All points must have
dimensionality `dimensionality`.
---
* `node.Insert(x)`
- Insert the point `x` into the tree.
- `x` should have vector type compatible with the chosen `MatType`; so, for
default `MatType`, `arma::vec` is the expected type.
- If a custom `MatType` is specified (e.g. `arma::fmat`), then `x` should
have type equivalent to the corresponding column vector type (e.g.
`arma::fvec`).
- Due to tree rebalancing, this may change the internal structure of the
tree; so references and pointers to children of `node` may become invalid.
- ***Warning:*** This will throw an exception if `node` is not the root of
the tree!
* `node.Delete(i)
- Delete the point with index `i` from the tree.
- The point to be deleted from the tree will be `node.Dataset().col(i)`;
after deleting, the column will be removed from `node.Dataset()` and all
indexes held in all tree nodes will be updated. (Thus, this operation can
be expensive!)
- Due to tree rebalancing, this may change the internal structure of the
tree; so references and pointers to children of `node` may become invalid.
- ***Warning:*** This will throw an exception if `node` is not the root of
the tree!
---
***Notes:***
- The name `node` is used here for `RectangleTree` objects instead of `tree`,
because each `RectangleTree` object is a single node in the tree. The
constructor returns the node that is the root of the tree.
- See also the
[developer documentation on tree constructors](../../../developer/trees.md#constructors-and-destructors).
---
### Constructor parameters:
| **name** | **type** | **description** | **default** |
|----------|----------|-----------------|-------------|
| `data` | [`MatType`](../../matrices.md) | [Column-major](../../matrices.md#representing-data-in-mlpack) matrix to build the tree on. | _(N/A)_ |
| `maxLeafSize` | `size_t` | Maximum number of points to store in each leaf. | `20` |
| `minLeafSize` | `size_t` | Minimum number of points to store in each leaf. | `8` |
| `maxNumChildren` | `size_t` | Maximum number of children allowed in each non-leaf node. | `5` |
| `minNumChildren` | `size_t` | Minimum number of children in each non-leaf node. | `2` |
| `dimensionality` | `size_t` | Dimensionality of points to be held in the tree. | _(N/A)_ |
| | | |
| `x` | [`arma::vec`](../../matrices.md) | Column vector: point to insert into tree. Should have type matching the column vector type associated with `MatType`, and must have `node.Dataset().n_rows` elements. | _(N/A)_ |
| `i` | `size_t` | Index of point in `node.Dataset()` to delete from `node`. | _(N/A)_ |
## Basic tree properties
Once a `RectangleTree` object is constructed, various properties of the tree
can be accessed or inspected. Many of these functions are required by the
[TreeType API](../../../developer/trees.md#the-treetype-api).
### Navigating the tree
* `node.NumChildren()` returns the number of children in `node`. This is `0`
if `node` is a leaf, and between the values of `node.MinNumChildren()` and
`node.MaxNumChildren()` (inclusive) otherwise.
* `node.IsLeaf()` returns a `bool` indicating whether or not `node` is a leaf.
* `node.Child(i)` returns a `RectangleTree&` that is the `i`th child.
- `i` must be less than `node.NumChildren()`.
- This function should only be called if `node.NumChildren()` is not `0`
(e.g. if `node` is not a leaf). Note that this returns a valid
`RectangleTree&` that can itself be used just like the root node of the
tree!
* `node.Parent()` will return a `RectangleTree*` that points to the parent of
`node`, or `NULL` if `node` is the root of the `RectangleTree`.
---
### Accessing members of a tree
* `node.Bound()` will return an
[`HRectBound<DistanceType, ElemType>&`](binary_space_tree.md#hrectbound)
object that represents the hyperrectangle bounding box of `node`.
- `ElemType` is the element type of `MatType`; so, if default template
parameters are used, `ElemType` is `double`.
- `bound` is a hyperrectangle that encloses all the descendant points of
`node`. It may be somewhat loose (e.g. points may not be very near the
edges).
* `node.Stat()` will return a `StatisticType&` holding the statistics of the
node that were computed during tree construction.
* `node.Distance()` will return a `EuclideanDistance&`. Since
`EuclideanDistance` has no members, this function is not likely to be useful,
but it is required by the TreeType API.
* `node.AuxiliaryInfo()` returns an `AuxiliaryInformationType&` that holds any
auxiliary information required by the node.
* `node.MinNumChildren()` returns the minimum number of children that the
node is required to have as a `size_t`. If points are deleted such that the
number of children falls below this limit, then `node` will become a leaf and
the tree will be rebalanced.
* `node.MaxNumChildren()` returns the maximum number of children that the
node is required to have as a `size_t`. If points are inserted such that the
number of children goes above this limit, new nodes will be added and the
tree will be rebalanced.
* `node.MaxLeafSize()` returns the maximum number of points that the node is
allowed to hold as a `size_t`. If the number of points held by `node`
exceeds this limit during insertion, then `node` will be split and the tree
will be rebalanced.
* `node.MinLeafSize()` returns the minimum number of points that the node is
allowed to hold as a `size_t`. If the number of points held by `node` goes
under this limit during deletion, then `node` will be deleted (if possible)
and the tree will be rebalanced.
See also the
[developer documentation](../../../developer/trees.md#basic-tree-functionality)
for basic tree functionality in mlpack.
---
### Accessing data held in a tree
* `node.Dataset()` will return a `const MatType&` that is an internally-held
representation of the dataset the tree was built on.
* `node.NumPoints()` returns a `size_t` indicating the number of points held
directly in `node`.
- If `node` is not a leaf, this will return `0`, as `RectangleTree` only
holds points directly in its leaves.
- If `node` is a leaf, then this will return values between
`node.MinLeafSize()` and `node.MaxLeafSize()` (inclusive).
- If the tree has fewer than `node.MinLeafSize()` points total, then
`node.NumPoints()` will return a value less than `node.MinLeafSize()`.
* `node.Point(i)` returns a `size_t` indicating the index of the `i`'th point
in `node.Dataset()`.
- `i` must be in the range `[0, node.NumPoints() - 1]` (inclusive).
- `node` must be a leaf (as non-leaves do not hold any points).
- The `i`'th point in `node` can then be accessed as
`node.Dataset().col(node.Point(i))`.
- Accessing the actual `i`'th point itself can be done with, e.g.,
`node.Dataset().col(node.Point(i))`.
- Point indices are not necessarily contiguous for `RectangleTree`s; that is,
`node.Point(i) + 1` is not necessarily `node.Point(i + 1)`.
* `node.NumDescendants()` returns a `size_t` indicating the number of points
held in all descendant leaves of `node`.
- If `node` is the root of the tree, then `node.NumDescendants()` will be
equal to `node.Dataset().n_cols`.
* `node.Descendant(i)` returns a `size_t` indicating the index of the `i`'th
descendant point in `node.Dataset()`.
- `i` must be in the range `[0, node.NumDescendants() - 1]` (inclusive).
- `node` does not need to be a leaf.
- The `i`'th descendant point in `node` can then be accessed as
`node.Dataset().col(node.Descendant(i))`.
- Accessing the actual `i`'th descendant itself can be done with, e.g.,
`node.Dataset().col(node.Descendant(i))`.
- Descendant point indices are not necessarily contiguous for
`RectangleTree`s; that is, `node.Descendant(i) + 1` is not necessarily
`node.Descendant(i + 1)`.
---
### Accessing computed bound quantities of a tree
The following quantities are cached for each node in a `RectangleTree`, and so
accessing them does not require any computation. In the documentation below,
`ElemType` is the element type of the given `MatType`; e.g., if `MatType` is
`arma::mat`, then `ElemType` is `double`.
* `node.FurthestPointDistance()` returns an `ElemType` representing the
distance between the center of the bound of `node` and the furthest point
held by `node`.
- If `node` is not a leaf, this returns 0 (because `node` does not hold any
points).
* `node.FurthestDescendantDistance()` returns an `ElemType` representing the
distance between the center of the bound of `node` and the furthest
descendant point held by `node`.
* `node.MinimumBoundDistance()` returns an `ElemType` representing the minimum
possible distance from the center of the node to any edge of its bound.
* `node.ParentDistance()` returns an `ElemType` representing the distance
between the center of the bound of `node` and the center of the bound of its
parent.
- If `node` is the root of the tree, `0` is returned.
***Note:*** for more details on each bound quantity, see the [developer
documentation](../../../developer/trees.md#complex-tree-functionality-and-bounds)
on bound quantities for trees.
---
### Other functionality
* `node.Center(center)` computes the center of the hyperrectangle bounding box
of `node` and stores it in `center`.
- `center` should be of type `arma::Col<ElemType>&`, where `ElemType` is the
element type of the specified `MatType`.
- `center` will be set to have size equivalent to the dimensionality of the
dataset held by `node`.
- This is equivalent to calling `node.Bound().Center(center)`.
* A `RectangleTree` can be serialized with
[`data::Save()` and `data::Load()`](../../load_save.md#mlpack-objects).
## Bounding distances with the tree
The primary use of trees in mlpack is bounding distances to points or other tree
nodes. The following functions can be used for these tasks.
* `node.GetNearestChild(point)`
* `node.GetFurthestChild(point)`
- Return a `size_t` indicating the index of the child that is closest to (or
furthest from) `point`, with respect to the `MinDistance()` (or
`MaxDistance()`) function.
- If there is a tie, the node with the lowest index is returned.
- If `node` is a leaf, `0` is returned.
- `point` should be a column vector type of the same type as `MatType`.
(e.g., if `MatType` is `arma::mat`, then `point` should be an `arma::vec`.)
* `node.GetNearestChild(other)`
* `node.GetFurthestChild(other)`
- Return a `size_t` indicating the index of the child that is closest to (or
furthest from) the `RectangleTree` node `other`, with respect to the
`MinDistance()` (or `MaxDistance()`) function.
- If there is a tie, the node with the lowest index is returned.
- If `node` is a leaf, `0` is returned.
---
* `node.MinDistance(point)`
* `node.MinDistance(other)`
- Return a `double` indicating the minimum possible distance between `node`
and `point`, or the `RectangleTree` node `other`.
- This is equivalent to the minimum possible distance between any point
contained in the bounding hyperrectangle of `node` and `point`, or between
any point contained in the bounding hyperrectangle of `node` and any point
contained in the bounding hyperrectangle of `other`.
- `point` should be a column vector type of the same type as `MatType`.
(e.g., if `MatType` is `arma::mat`, then `point` should be an `arma::vec`.)
* `node.MaxDistance(point)`
* `node.MaxDistance(other)`
- Return a `double` indicating the maximum possible distance between `node`
and `point`, or the `RectangleTree` node `other`.
- This is equivalent to the maximum possible distance between any point
contained in the bounding hyperrectangle of `node` and `point`, or between
any point contained in the bounding hyperrectangle of `node` and any point
contained in the bounding hyperrectangle of `other`.
- `point` should be a column vector type of the same type as `MatType`.
(e.g., if `MatType` is `arma::mat`, then `point` should be an `arma::vec`.)
* `node.RangeDistance(point)`
* `node.RangeDistance(other)`
- Return a [`RangeType<ElemType>`](../math.md#range) whose lower bound is
`node.MinDistance(point)` or `node.MinDistance(other)`, and whose upper
bound is `node.MaxDistance(point)` or `node.MaxDistance(other)`.
- `ElemType` is the element type of `MatType`.
- `point` should be a column vector type of the same type as `MatType`.
(e.g., if `MatType` is `arma::mat`, then `point` should be an `arma::vec`.)
## Tree traversals
Like every mlpack tree, the `RectangleTree` class provides a [single-tree and
dual-tree traversal](../../../developer/trees.md#traversals) that can be paired
with a [`RuleType` class](../../../developer/trees.md#rules) to implement a
single-tree or dual-tree algorithm.
* `RectangleTree::SingleTreeTraverser`
- Implements a depth-first single-tree traverser.
* `RectangleTree::DualTreeTraverser`
- Implements a dual-depth-first dual-tree traverser.
## `StatisticType`
Each node in a `RectangleTree` holds an instance of the `StatisticType` class.
This class can be used to store additional bounding information or other cached
quantities that a `RectangleTree` does not already compute.
mlpack provides a few existing `StatisticType` classes, and a custom
`StatisticType` can also be easily implemented:
* [`EmptyStatistic`](#emptystatistic): an empty statistic class that does not
hold any information
* [Custom `StatisticType`s](#custom-statistictypes): implement a fully custom
`StatisticType`
*Note:* this section is still under construction---not all statistic types are
documented yet.
### `EmptyStatistic`
The `EmptyStatistic` class is an empty placeholder class that is used as the
default `StatisticType` template parameter for mlpack trees.
The class ***does not hold any members and provides no functionality***.
[See the implementation.](/src/mlpack/core/tree/statistic.hpp)
### Custom `StatisticType`s
A custom `StatisticType` is trivial to implement. Only a default constructor
and a constructor taking a `RectangleTree` is necessary.
```
class CustomStatistic
{
public:
// Default constructor required by the StatisticType policy.
CustomStatistic();
// Construct a CustomStatistic for the given fully-constructed
// `RectangleTree` node. Here we have templatized the tree type to make it
// easy to handle any type of `RectangleTree`.
template<typename TreeType>
StatisticType(TreeType& node);
//
// Adding any additional precomputed bound quantities can be done; these
// quantities should be computed in the constructor. They can then be
// accessed from the tree with `node.Stat()`.
//
};
```
*Example*: suppose we wanted to know, for each node, the exact time at which it
was created. A `StatisticType` could be created that has a
[`std::time_t`](https://en.cppreference.com/w/cpp/chrono/c/time_t) member,
whose value is computed in the constructor.
## `SplitType`
The `SplitType` template parameter controls the algorithm used to split each
node of a `RectangleTree` while building. The splitting strategy used can be
entirely arbitrary---the `SplitType` simply needs to split a leaf node and a
non-leaf node into children.
mlpack provides several drop-in choices for `SplitType`, and it is also possible
to write a fully custom split:
* [`RTreeSplit`](#rtreesplit): splits according to a simple binary heuristic
* [`RStarTreeSplit`](#rstartreesplit): finds the best possible binary split
that minimizes the volume of the two children and maximizes the margin
between them
* [`XTreeSplit`](#xtreesplit): an improved splitting strategy that minimizes
overlap of sibling nodes
* [`RPlusTreeSplit`](#rplustreesplit): split by partitioning two nodes along
the dimension that minimizes overall node volume
* [`RPlusPlusTreeSplit`](#rplusplustreesplit): split using maximum bounding
rectangles to ensure zero overlap between sibling nodes
* [`HilbertRTreeSplit<>`](#hilbertrtreesplit): use deferred splitting and
Z-ordering values of points to decide the split
* [Custom `SplitType`s](#custom-splittypes): implement a fully custom
`SplitType` class
*Note:* this section is still under construction---not all split types are
documented yet.
### `RTreeSplit`
The `RTreeSplit` class implements the original R-tree splitting strategy and can
be used with the [`RectangleTree`](#rectangletree) class. This is the splitting
strategy used for the [`RTree`](r_tree.md) class, and is the same strategy
proposed in the [original paper
(pdf)](http://www-db.deis.unibo.it/courses/SI-LS/papers/Gut84.pdf). The
strategy works as follows:
* Find the two furthest-apart points (or children if the node is not a leaf).
* Create two children with each point (or child) as the only point (or child).
* Iteratively add each remaining point (or child) to the new child whose
hyperrectangle bound volume increases the least.
For implementation details, see
[the source code](/src/mlpack/core/tree/rectangle_tree/r_tree_split_impl.hpp).
### `RStarTreeSplit`
The `RStarTreeSplit` class implements the improved R\*-tree splitting strategy
and can be used with the [`RectangleTree`](#rectangletree) class. This is the
splitting strategy used for the [`RStarTree`](r_star_tree.md) class, and is the
strategy proposed in the
[R\*-tree paper (pdf)](https://dl.acm.org/doi/pdf/10.1145/93597.98741). The
strategy computes, for each possible binary split in each dimension,
* The combined volume of the two child nodes,
* The size of the margin between the two child nodes, and
* The size of the overlap between the two child nodes.
The split that minimizes the combined volume and maximizes the overlap is
chosen.
In addition, the `RStarTreeSplit` will sometimes perform *forced reinsertion*,
where points are removed from a node during the splitting process and reinserted
into the tree. This can help decrease the overlap between adjacent nodes in the
tree, which in turn improves the quality of the tree for search and other tasks.
For implementation details, see
[the source code](/src/mlpack/core/tree/rectangle_tree/r_star_tree_split_impl.hpp).
### `XTreeSplit`
The `XTreeSplit` class implements the improved splitting strategy for the
[`XTree`](x_tree.md) as described in the
[X-tree paper (pdf)](https://www.vldb.org/conf/1996/P028.PDF). This strategy is
an improved version of the standard [`RTreeSplit`](#rtreesplit), where the
overlap of sibling nodes is minimized.
When overlap cannot be prevented, `XTreeSplit` will instead create
"super-nodes" with more children than typically allowed. The split is then
deferred until a later time when overlap can be more effectively avoided.
For implementation details, see
[the source code](/src/mlpack/core/tree/rectangle_tree/x_tree_split_impl.hpp).
### `RPlusTreeSplit`
The `RPlusTreeSplit` class implements the splitting policy of the
[R+-tree](r_plus_tree.md). The strategy splits nodes (leaves and non-leaves) by
partitioning along the dimension that results in the two children with minimum
volume, similar to the [kd-tree](kdtree.md).
For implementation details, see
[the source code](/src/mlpack/core/tree/rectangle_tree/r_plus_tree_split_impl.hpp).
Note that `RPlusTreeSplit` is a template typedef of the general
`RPlusTreeSplitType<>` class.
### `RPlusPlusTreeSplit`
The `RPlusPlusTreeSplit` class implements the splitting policy of the
[R++-tree](r_plus_plus_tree.md). This class can only be used in a tree that
uses [`RPlusPlusTreeAuxiliaryInformation`](#rplusplustreeauxiliaryinformation)
as the [`AuxiliaryInformationType`](#auxiliaryinformationtype).
The splitting strategy splits leaf nodes along an arbitrarily-chosen dimension,
and splits non-leaf nodes along the dimension that minimizes the number of
descendant nodes that also must be split along that dimension.
For implementation details, see
[the source code](/src/mlpack/core/tree/rectangle_tree/r_plus_tree_split_impl.hpp).
Note that `RPlusPlusTreeSplit` is a template typedef of the general
`RPlusTreeSplitType<>` class.
### `HilbertRTreeSplit<>`
The `HilbertRTreeSplit<>` class is an implementation of the
[`HilbertRTree`](hilbert_r_tree.md) splitting strategy. This strategy, proposed
in [the original paper (pdf)](https://www.vldb.org/conf/1994/P500.PDF), has two
main differences from the standard [`RTreeSplit`](#rtreesplit) strategy:
* The idea of space-filling curves is used to order points for insertion.
* Instead of one node splitting into two, the `HilbertRTreeSplit<>` class
defers splitting, and re-splits a group of two nodes into three nodes.
- *Note*: this behavior is configurable, see below.
When inserting a point, one cooperating sibling node is found. If both the node
and its cooperating sibling are full, then all points in the two nodes as well
as the point being inserted are ordered by Z-ordering value (also known as
Morton ordering), and split evenly into three nodes.
***Notes:***
- `HilbertRTreeSplit<>` has one template parameter, which controls the number
of sibling nodes to split. This is why the class must be specified as
`HilbertRTreeSplit<>` and not `HilbertRTreeSplit`.
- By default, `HilbertRTreeSplit<>` splits two sibling nodes into three new
nodes; but this is configurable: `HilbertRTreeSplit<N>` will split `N`
sibling nodes into `N + 1` new nodes.
- The concept of splitting based on Z-ordering is also used in the
[`UBTreeSplit`](binary_space_tree.md#ubtreesplit) strategy for the
[`UBTree`](ub_tree.md), a variant of the
[`BinarySpaceTree`](binary_space_tree.md) class.
### Custom `SplitType`s
Custom split strategies for a `RectangleTree` can be implemented via the
`SplitType` template parameter. By default, the [`RTreeSplit`](#rtreesplit)
splitting strategy is used, but it is also possible to implement and use a
custom `SplitType`. Any custom `SplitType` class must implement the following
signature:
```c++
class SplitType
{
public:
// Given the leaf node `tree`, split into multiple nodes. `TreeType` will be
// the relevant `RectangleTree` type. `tree` should be modified directly.
//
// `relevels` is an auxiliary array used by some splitting strategies, such as
// the `RStarTreeSplit`, to indicate whether a node needs to be reinserted
// into the tree.
template<typename TreeType>
static void SplitLeafNode(TreeType* tree, std::vector<bool>& relevels);
// Given the non-leaf node `tree`, split into multiple nodes. `TreeType` will
// be the relevant `RectangleTree` type. `tree` should be modified directly.
//
// `relevels` is an auxiliary array used by some splitting strategies, such as
// the `RStarTreeSplit`, to indicate whether a node needs to be reinserted
// into the tree.
template<typename TreeType>
static void SplitNonLeafNode(TreeType* tree, std::vector<bool>& relevels);
};
```
## `DescentType`
The `DescentType` template parameter controls the algorithm used to assign child
points and child nodes to nodes in a `RectangleTree`. The strategy used can be
arbitrary: the `DescentType` simply needs to return an index of a child to
insert a point or node into.
mlpack provides several drop-in choices for `DescentType`, and it is also
possible to write a fully custom split:
* [`RTreeDescentHeuristic`](#rtreedescentheuristic): selects the closest child,
which is the child whose volume will increase the least.
* [`RStarTreeDescentHeuristic`](#rstartreedescentheuristic): selects a child
such that overlap is minimized and volume increase is minimized
* [`RPlusTreeDescentHeuristic`](#rplustreedescentheuristic): selects a child
that does not cause overlap; if not possible, creates a new child
* [`RPlusPlusTreeDescentHeuristic`](#rplusplustreedescentheuristic): selects
the child whose maximum bounding box contains the point
such that overlap is minimized and volume increase is minimized.
* [`HilbertRTreeDescentHeuristic`](#hilbertrtreedescentheuristic): select the
first child with minimum Z-order value greater than the point or node to be
inserted.
* [Custom `SplitType`s](#custom-splittypes): implement a fully custom
`SplitType` class
*Note:* this section is still under construction---not all split types are
documented yet.
### `RTreeDescentHeuristic`
The `RTreeDescentHeuristic` is the default descent strategy for the
`RectangleTree` and is used by the [`RTree`](r_tree.md). The strategy is
simple: the child node whose volume will increase the least is chosen as the
child to insert a point or other node into.
For implementation details, see
[the source code](/src/mlpack/core/tree/rectangle_tree/r_tree_descent_heuristic.hpp).
### `RStarTreeDescentHeuristic`
The `RStarTreeDescentHeuristic` is a descent strategy for the
[`RectangleTree`](#rectangletree) and is used by the
[`RStarTree`](r_star_tree.md). The heuristic will always prefer to insert a
point or node into a child node whose hyperrectangle bound already contains the
point or node to be inserted.
When inserting a point or node into a node whose children are leaves, the
strategy will choose to insert into the child where the overall overlap of
children's volumes after insertion is minimized.
When inserting a point or node into a node whose children are not leaves, the
strategy will choose to insert into the child whose volume is the smallest after
insertion.
For implementation details, see [the source
code](/src/mlpack/core/tree/rectangle_tree/r_star_tree_descent_heuristic.hpp).
### `RPlusTreeDescentHeuristic`
The `RPlusTreeDescentHeuristic` is the descent strategy used by the
[`RPlusTree`](r_plus_tree.md). When determining which node to insert a point
into, the following heuristic is used:
* If the point to be inserted already falls within the bounding hyperrectangle
of a child, select that child.
* If the point to be inserted does not fall within the bounding
hyperrectangle of any child, but a child's volume can be expanded to
encompass the point *without* causing any children to overlap, select that
child.
* If neither of the conditions above are true, insert the point into a new
child node. This child node will likely be rebalanced or modified later by
[`RPlusTreeSplit`](#rplustreesplit).
### `RPlusPlusTreeDescentHeuristic`
The `RPlusPlusTreeDescentHeuristic` is the descent strategy used by the
[`RPlusPlusTree`](r_plus_plus_tree.md). The strategy chooses the child whose
outer bound (held by
[`RPlusPlusTreeAuxiliaryInformation`](#rplusplustreeauxiliaryinformation))
contains the point to be inserted.
For implementation details, see [the source
code](/src/mlpack/core/tree/rectangle_tree/r_plus_plus_tree_descent_heuristic.hpp).
### `HilbertRTreeDescentHeuristic`
The `HilbertRTreeDescentHeuristic` is the descent strategy used by the
[`HilbertRTree`](hilbert_r_tree.md). The strategy depends on the concept of
Z-ordering (or Morton ordering): the child node whose minimum Z-ordering value
is closest to but greater than the Z-ordering value of the point to be inserted
is chosen.
For implementation details, see
[the source code](/src/mlpack/core/tree/rectangle_tree/hilbert_r_tree_descent_heuristic_impl.hpp).
### Custom `DescentType`s
Custom descent strategies for a `RectangleTree` can be implemented via the
`DescentType` template parameter. By default, the
[`RTreeDescentHeuristic`](#rtreedescentheuristic) descent strategy is used,
but it is also possible to implement and use a custom `DescentType`. Any custom
`DescentType` class must implement the following signature:
```c++
class DescentType
{
public:
// Return a `size_t` indicating which child of `node` should be chosen to
// insert `point` in.
//
// `TreeType` will be the relevant `RectangleTree` type.
template<typename TreeType>
static size_t ChooseDescentNode(const TreeType* node, const size_t point);
// Return a `size_t` indicating which child of `node` should be chosen to
// insert `insertedNode` in.
//
// `TreeType` will be the relevant `RectangleTree` type.
template<typename TreeType>
static size_t ChooseDescentNode(const TreeType* node,
const TreeType* insertedNode);
};
```
## `AuxiliaryInformationType`
The `AuxiliaryInformationType` template parameter holds any auxiliary
information required by the `SplitType` or `DescentType` strategies. By
default, the `NoAuxiliaryInformation` class is used, which holds nothing.
Different variants of `RectangleTree`s may use other predefined types for their
`AuxiliaryInformationType`s:
* [`XTreeAuxiliaryInformation`](#xtreeauxiliaryinformation): used for the
[`XTree`](x_tree.md).
* [`RPlusPlusTreeAuxiliaryInformation`](#rplusplustreeauxiliaryinformation):
used for the [`RPlusPlusTree`](r_plus_plus_tree.md).
* [`DiscreteHilbertRTreeAuxiliaryInformation`](#discretehilbertrtreeauxiliaryinformation):
used for the [`HilbertRTree`](hilbert_r_tree.md).
### `XTreeAuxiliaryInformation`
The `XTreeAuxiliaryInformation` class is the auxiliary information type used by
the [`XTree`](x_tree.md) class, and is meant to be used with the
[`XTreeSplit`](#xtreesplit) splitting strategy. It holds information required
to construct super-nodes (a concept specific to X-trees), where splitting is
being deferred.
For implementation details, see [the source
code](/src/mlpack/core/tree/rectangle_tree/x_tree_auxiliary_information.hpp).
### `RPlusPlusTreeAuxiliaryInformation`
The `RPlusPlusTreeAuxiliaryInformation` class is used by the
[`RPlusPlusTree`](r_plus_plus_tree.md) to store information required for tree
building. In addition to the regular
[`HRectBound`](binary_space_tree.md#hrectbound) that is used to maintain the
minimum bounding rectangle of each node, each R++-tree node also maintains an
'outer bound' that represents the *maximum* bounding rectangle. This maximum
bounding rectangle is used for splitting, instead of the minimum bounding
rectangle; this helps prevent overlap in nodes.
For an object `auxInfo`, the function `auxInfo.OuterBound()` will return an
[`HRectBound&`](binary_space_tree.md#hrectbound). If the tree was built with a
[non-standard `MatType`](#template-parameters), then the type returned will be
`HRectBound<EuclideanDistance, ElemType>`, where `ElemType` is the element type
of the given `MatType`.
For implementation details, see [the source
code](/src/mlpack/core/tree/rectangle_tree/r_plus_plus_tree_auxiliary_information.hpp).
### `DiscreteHilbertRTreeAuxiliaryInformation`
The `DiscreteHilbertRTreeAuxiliaryInformation` class is used by the
[`HilbertRTree`](hilbert_r_tree.md). It stores the largest Z-ordering value of
any descendant point of a node. (This can be accessed with the `HilbertValue()`
method.)
For more details, see
[the source code](/src/mlpack/core/tree/rectangle_tree/hilbert_r_tree_auxiliary_information_impl.hpp).
### Custom `AuxiliaryInformationType`s
Custom `AuxiliaryInformationType`s can be implemented and used with the
`AuxiliaryInformationType` template parameter. Any custom
`AuxiliaryInformationType` class must implement the following signature:
```c++
// TreeType will be the type of RectangleTree that the auxiliary information
// type is being used in.
template<typename TreeType>
class CustomAuxiliaryInformationType
{
public:
// Default constructor is required.
CustomAuxiliaryInformationType();
// Construct the object with a tree node that may not yet be constructed.
CustomAuxiliaryInformationType(TreeType* node);
// Construct the object with another object and another tree node, optionally
// making a 'deep copy' instead of just copying pointers where relevant.
CustomAuxiliaryInformationType(const CustomAuxiliaryInformationType& other,
TreeType* node,
const bool deepCopy = true);
// Just before a point is inserted into a node, this is called.
// `node` is the node that will have `node.Dataset().col(point)` inserted into
// it.
//
// Optionally, this method can manipulate `node`. If so, `true` should be
// returned to indicate that `node` was changed. Otherwise, return `false`
// and the RectangleTree will perform its default behavior.
bool HandlePointInsertion(TreeType* node, const size_t point);
// Just before a child node is inserted into a node, this is called.
// `node` is the node that will have `nodeToInsert` inserted into it as a
// child.
//
// Optionally, this method can manipulate `node`. If so, `true` should be
// returned to indicate that `node` was changed. Otherwise, return `false`
// and the RectangleTree will perform its default behavior.
bool HandleNodeInsertion(TreeType* node,
TreeType* nodeToInsert,
const bool atMaxDepth);
// Just before a point is deleted from a node, this is called.
// `node` is the node that will have `node.Dataset().col(point)` deleted from
// it.
//
// Optionally, this method can manipulate `node`. If so, `true` should be
// returned to indicate that `node` was changed. Otherwise, return `false`
// and the RectangleTree will perform its default behavior.
bool HandlePointDeletion(TreeType* node, const size_t point);
// Just before a child node is deleted from a node, this is called.
// `node` is the node that will have `node.Child(nodeIndex)` deleted from it.
//
// Optionally, this method can manipulate `node`. If so, `true` should be
// returned to indicate that `node` was changed. Otherwise, return `false`
// and the RectangleTree will perform its default behavior.
bool HandleNodeRemoval(TreeType* node, const size_t nodeIndex);
// When `node` is changed, this is called so that the auxiliary information
// can be updated. If information needs to be propagated upward, return
// `true` and then `UpdateAuxiliaryInfo(node->Parent())` will be called.
bool UpdateAuxiliaryInfo(TreeType* node);
};
```
## Example usage
The `RectangleTree` class is only really necessary when a custom split type or
custom descent strategy is intended to be used. For simpler use cases, one of
the typedefs of `RectangleTree` (such as [`RTree`](r_tree.md)) will suffice.
For this reason, all of the examples below explicitly specify all six template
parameters of `RectangleTree`.
[Writing a custom splitting strategy](#custom-splittypes),
[writing a custom descent strategy](#custom-descenttypes),
and [writing a custom auxiliary information
type](#custom-auxiliaryinformationtypes) are discussed in the previous sections.
Each of the parameters in the examples below can be trivially changed for
different behavior.
---
Build a `RectangleTree` on the `cloud` dataset and print basic statistics
about the tree.
```c++
// See https://datasets.mlpack.org/cloud.csv.
arma::mat dataset;
mlpack::data::Load("cloud.csv", dataset, true);
// Build the rectangle tree with a leaf size of 10. (This means that leaf nodes
// cannot contain more than 10 points.)
//
// The std::move() means that `dataset` will be empty after this call, and no
// data will be copied during tree building.
mlpack::RectangleTree<mlpack::EuclideanDistance,
mlpack::EmptyStatistic,
arma::mat,
mlpack::RTreeSplit,
mlpack::RTreeDescentHeuristic,
mlpack::NoAuxiliaryInformation> tree(std::move(dataset));
// Print the bounding box of the root node.
std::cout << "Bounding box of root node:" << std::endl;
for (size_t i = 0; i < tree.Bound().Dim(); ++i)
{
std::cout << " - Dimension " << i << ": [" << tree.Bound()[i].Lo() << ", "
<< tree.Bound()[i].Hi() << "]." << std::endl;
}
std::cout << std::endl;
// Print the number of children in the root, and the allowable range.
std::cout << "Number of children of root: " << tree.NumChildren()
<< "; allowable range: [" << tree.MinNumChildren() << ", "
<< tree.MaxNumChildren() << "]." << std::endl;
// Print the number of descendant points of the root, and of each of its
// children.
std::cout << "Descendant points of root: "
<< tree.NumDescendants() << "." << std::endl;
for (size_t i = 0; i < tree.NumChildren(); ++i)
{
std::cout << "Descendant points of child " << i << ": "
<< tree.Child(i).NumDescendants() << "." << std::endl;
}
std::cout << std::endl;
// Compute the center of the RectangleTree.
arma::vec center;
tree.Center(center);
std::cout << "Center of tree: " << center.t();
```
---
Build two `RectangleTree`s on subsets of the corel dataset and compute minimum
and maximum distances between different nodes in the tree.
```c++
// See https://datasets.mlpack.org/corel-histogram.csv.
arma::mat dataset;
mlpack::data::Load("corel-histogram.csv", dataset, true);
// Convenience typedef for the tree type.
using TreeType = mlpack::RectangleTree<mlpack::EuclideanDistance,
mlpack::EmptyStatistic,
arma::mat,
mlpack::RTreeSplit,
mlpack::RTreeDescentHeuristic,
mlpack::NoAuxiliaryInformation>;
// Build trees on the first half and the second half of points.
TreeType tree1(dataset.cols(0, dataset.n_cols / 2));
TreeType tree2(dataset.cols(dataset.n_cols / 2 + 1, dataset.n_cols - 1));
// Compute the maximum distance between the trees.
std::cout << "Maximum distance between tree root nodes: "
<< tree1.MaxDistance(tree2) << "." << std::endl;
// Get the leftmost grandchild of the first tree's root---if it exists.
if (!tree1.IsLeaf() && !tree1.Child(0).IsLeaf())
{
TreeType& node1 = tree1.Child(0).Child(0);
// Get the leftmost grandchild of the second tree's root---if it exists.
if (!tree2.IsLeaf() && !tree2.Child(0).IsLeaf())
{
TreeType& node2 = tree2.Child(0).Child(0);
// Print the minimum and maximum distance between the nodes.
mlpack::Range dists = node1.RangeDistance(node2);
std::cout << "Possible distances between two grandchild nodes: ["
<< dists.Lo() << ", " << dists.Hi() << "]." << std::endl;
// Print the minimum distance between the first node and the first
// descendant point of the second node.
const size_t descendantIndex = node2.Descendant(0);
const double descendantMinDist =
node1.MinDistance(node2.Dataset().col(descendantIndex));
std::cout << "Minimum distance between grandchild node and descendant "
<< "point: " << descendantMinDist << "." << std::endl;
// Which child of node2 is closer to node1?
const size_t closestIndex = node2.GetNearestChild(node1);
std::cout << "Child " << closestIndex << " is closest to node1."
<< std::endl;
// And which child of node1 is further from node2?
const size_t furthestIndex = node1.GetFurthestChild(node2);
std::cout << "Child " << furthestIndex << " is furthest from node2."
<< std::endl;
}
}
```
---
Build a `RectangleTree` on 32-bit floating point data and save it to disk.
```c++
// See https://datasets.mlpack.org/corel-histogram.csv.
arma::fmat dataset;
mlpack::data::Load("corel-histogram.csv", dataset);
// Build the RectangleTree using 32-bit floating point data as the matrix
// type. We will still use the default EmptyStatistic and EuclideanDistance
// parameters. A leaf size of 100 is used here.
mlpack::RectangleTree<mlpack::EuclideanDistance,
mlpack::EmptyStatistic,
arma::fmat,
mlpack::RTreeSplit,
mlpack::RTreeDescentHeuristic,
mlpack::NoAuxiliaryInformation> tree(
std::move(dataset), 100);
// Save the tree to disk with the name 'tree'.
mlpack::data::Save("tree.bin", "tree", tree);
std::cout << "Saved tree with " << tree.Dataset().n_cols << " points to "
<< "'tree.bin'." << std::endl;
```
---
Load a 32-bit floating point `RectangleTree` from disk, then traverse it
manually and find the number of leaf nodes with less than 10 points.
```c++
// This assumes the tree has already been saved to 'tree.bin' (as in the example
// above).
// This convenient typedef saves us a long type name!
using TreeType = mlpack::RectangleTree<mlpack::EuclideanDistance,
mlpack::EmptyStatistic,
arma::fmat,
mlpack::RTreeSplit,
mlpack::RTreeDescentHeuristic,
mlpack::NoAuxiliaryInformation>;
TreeType tree;
mlpack::data::Load("tree.bin", "tree", tree);
std::cout << "Tree loaded with " << tree.NumDescendants() << " points."
<< std::endl;
// Recurse in a depth-first manner. Count both the total number of leaves, and
// the number of leaves with less than 10 points.
size_t leafCount = 0;
size_t totalLeafCount = 0;
std::stack<TreeType*> stack;
stack.push(&tree);
while (!stack.empty())
{
TreeType* node = stack.top();
stack.pop();
if (node->NumPoints() < 10)
++leafCount;
++totalLeafCount;
for (size_t i = 0; i < node->NumChildren(); ++i)
stack.push(&node->Child(i));
}
// Note that it would be possible to use TreeType::SingleTreeTraverser to
// perform the recursion above, but that is more well-suited for more complex
// tasks that require pruning and other non-trivial behavior; so using a simple
// stack is the better option here.
// Print the results.
std::cout << leafCount << " out of " << totalLeafCount << " leaves have fewer "
<< "than 10 points." << std::endl;
```
---
Build a `RectangleTree` by iteratively inserting points from the corel dataset,
print some information, and then remove a few randomly chosen points.
```c++
// See https://datasets.mlpack.org/corel-histogram.csv.
arma::mat dataset;
mlpack::data::Load("corel-histogram.csv", dataset, true);
// This convenient typedef saves us a long type name!
using TreeType = mlpack::RectangleTree<mlpack::EuclideanDistance,
mlpack::EmptyStatistic,
arma::mat,
mlpack::RTreeSplit,
mlpack::RTreeDescentHeuristic,
mlpack::NoAuxiliaryInformation>;
// Create an empty tree of the right dimensionality.
TreeType t(dataset.n_rows);
// Insert points one by one for the first half of the dataset.
for (size_t i = 0; i < dataset.n_cols / 2; ++i)
t.Insert(dataset.col(i));
std::cout << "After inserting half the points, the root node has "
<< t.NumDescendants() << " descendant points and "
<< t.NumChildren() << " child nodes." << std::endl;
// For the second half, insert the points backwards.
for (size_t i = dataset.n_cols - 1; i >= dataset.n_cols / 2; --i)
t.Insert(dataset.col(i));
std::cout << "After inserting all the points, the root node has "
<< t.NumDescendants() << " descendant points and "
<< t.NumChildren() << " child nodes." << std::endl;
// Remove three random points.
t.Delete(mlpack::math::RandInt(0, t.NumDescendants()));
std::cout << "After removing 1 point, the root node has " << t.NumDescendants()
<< " descendant points." << std::endl;
t.Delete(mlpack::math::RandInt(0, t.NumDescendants()));
std::cout << "After removing 2 points, the root node has " << t.NumDescendants()
<< " descendant points." << std::endl;
t.Delete(mlpack::math::RandInt(0, t.NumDescendants()));
std::cout << "After removing 3 points, the root node has " << t.NumDescendants()
<< " descendant points." << std::endl;
```
|