File: kdtree.txt

package info (click to toggle)
gamera 1:3.4.2+git20160808.1725654-1
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 22,312 kB
  • ctags: 24,991
  • sloc: xml: 122,324; ansic: 52,869; cpp: 50,664; python: 35,034; makefile: 118; sh: 101
file content (311 lines) | stat: -rw-r--r-- 10,428 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
======================
Gamera kd-tree library
======================

Introduction
------------

A *kd-tree* is multidimensional generalization of a binary
search tree. It can be used efficiently for range queries and nearest
neighbor searches, provided the dimension is not to high. In document
analysis problems, the dimension is typically two, so that kd-trees
can be a powerful utility for layout analysis problems.

For a detailed description of the present kd-tree implementation with
examples for use cases, see

  C. Dalitz: `Kd-Trees for Document Layout Analysis.`__
  In C. Dalitz (Ed.): "Document Image Analysis with the Gamera Framework."
  Schriftenreihe des Fachbereichs Elektrotechnik und Informatik,
  Hochschule Niederrhein, vol. 8, pp. 39-52, Shaker Verlag (2009)

.. __: http://lionel.kr.hsnr.de/~dalitz/data/publications/sr09-kdtree-layout.pdf

Additional background information on kd-trees can be found in the 
following literature:

- for an introduction to kd-trees and basic properties, see [deBerg2000]_
- more algorithms for kd-trees can be found in the original article 
  [Bentley1975]_
- neither of the above two references covers nearest neighbor
  searches in kd-trees; these are coverd in [Friedman1977]_


Examples
--------

Here is an example for looking up the three nearest neighbors to
a given point from a set of sample points:

.. code:: Python

   from gamera.kdtree import *

   points = [(1,4), (2,4), (1,5), (3,6), (8,9),
             (2,7), (4,4), (5,5), (4,6), (8,3)]
   nodes = [KdNode(p) for p in points]
   tree = KdTree(nodes)

   # neighbors to a sample point not from the set
   point = [5,6]
   k = 3
   knn = tree.k_nearest_neighbors(point, k)
   print "%i neighbors of (%i,%i):" % (k,point[0], point[1]),
   for node in knn:
       print "(%i,%i)" % (node.point[0], node.point[1]),
   print "" # final newline

   # neighbors to a sample point from the set
   # we must query k+1 neighbors, because one of them is
   # the sample point (the first entry in the returned list)
   point = [5,5]
   k = 3
   knn = tree.k_nearest_neighbors(point, k+1)
   print "%i neighbors of (%i,%i):" % (k,point[0], point[1]),
   for node in knn[1:]:
       print "(%i,%i)" % (node.point[0], node.point[1]),
   print "" # final newline

The property *KdNode.data* can store an arbitrary Python object
associated with the point. The following example represents each
connected component by its middle point and stores the actual
CC with the point in the node:

.. code:: Python

   ccs = image.cc_analysis()
   nodes = [KdNode([cc.center.x,cc.center.y], cc) for cc in ccs]

It is also possible to search for only those neighbors that fulfil
a certain predicate. To this end, you must define a function or callable
class that takes a KdNode as input and returns ``True`` when it fulfils
the search predicate. The following example only returns nearest neighbors
below the search point:

.. code:: Python

   # predicate definition as callable class
   class predicate(object):
       def __init__(self, point):
           self.point = point
       def __call__(self, node):
           return (point[1] > node.point[1])

   # find the three nearest neighbors below
   point = (5,6)
   knn = tree.k_nearest_neighbors(point, 3, predicate(point))

Beware however, that a search predicate can have the effect that
less than k neighbors are returned, because too few nodes fulfil
the predicate. In such a case, the runtime is O(n) instead of O(log(n)).
In other words, a poorly chosen search predicate can have a considerable
negative impact on the runtime. In some cases, modifying the distance metric
(method *set_distance*) can be an alternative to a search predicate
[Dalitz09]_.


The Kd-Tree Python API
----------------------


``KdNode`` objects
''''''''''''''''''

Each ``KdNode`` has two properties:

 *point*
   The geometric location of the node as a sequence of coordinate
   numbers. The coordinate numbers can be floats or ints.
   This is an immutable property, because changing the geometric
   location of nodes belonging to an already built kd-tree obviously
   breaks subsequent search operations.

 *data*
   An arbitrary Python object connected to the location *point*.

.. docstring:: gamera.kdtree KdNode

``KdTree`` objects
''''''''''''''''''

Each kd-tree is represented by instances of the ``KdTree`` class.
Even though there are general kd-tree algorithms to add and remove
nodes dynamically (see [Bentley1975]_), the present implementation
does not support alteration of a once built tree. This has the
consequence that tree nodes must be passed to the constructor of
``KdTree``.

A ``KdTree`` has the following (read only) properties:

 *dimension*
   The dimension of the kd-tree. This is automatically determined
   by the constructor.

.. docstring:: gamera.kdtree KdTree

.. docstring:: gamera.kdtree KdTree set_distance

.. docstring:: gamera.kdtree KdTree k_nearest_neighbors


The Kd-Tree C++ API
-------------------

The module ``gamera.kdtree`` is only a thin Python wrapper around
a C++ class ``KdTree``. This can also be used directly in C++ plugins.

Compilation and linkage
'''''''''''''''''''''''

The header file *kdtree.hpp* declares the necessary structures in
the namespace ``Gamera::Kdtree``. It is installed with the other gamera
header files, and can thus be included with

.. code:: CPP

   #include "geostructs/kdtree.hpp"
   using namespace Gamera::Kdtree;

The tricky part is getting your plugin module to be linked with the
actual kdtree implementation. This is achieved by adding the
source file ``kdtree.cpp`` to the ``cpp_sources`` property in the
plugin Python interface.

In the gamera core code, the following works:

.. code:: Python

   class ExampleModule(PluginModule):
       category = "MyPlugins"
       cpp_headers = ["myplugins.hpp"]
       cpp_sources = ["src/geostructs/kdtree.cpp"]
       functions = [myplugin1, myplugin2]
   module = ExampleModule()

In a toolkit, this will not work, because the path names in
the ``cpp_sources`` property are relative to the location
of the *setup.py* script. To allow for the use of the KdTree C++
class in toolkits, the source file *kdtree.cpp* is installed together
with Gamera. You can thus specify this file in ``cpp_sources``
as follows:

.. code:: Python

   class ExampleModule(PluginModule):
       # ...
       import os, sys, gamera
       gamera_root = os.path.dirname(gamera.__file__)
       cppfile = os.path.join(gamera_root,"src/geostructs/kdtree.cpp")
       if not os.path.exists(cppfile):
           gamera_root = os.path.join(sys.exec_prefix,"gamera")
           cppfile = os.path.join(gamera_root,"src/geostructs/kdtree.cpp")
       cpp_sources=[cppfile]
       # ...

Querying the installation directory is a bit tricky, because the python
distutils do not install additional *data_files* in a predictable way:
they might go into the gamera installation directory, which
can be found out through the *__file__* property of the gamera module, or
they might go into *sys.exec_prefix/gamera* (this is what the distutils
documentation says, but apparently this does not hold on all platforms).


Usage
'''''

For normal use of the kdtree, you will need the classes 
``CoordPoint`` (a typedef for ``vector<double>``), ``KdNode``, 
``KdNodeVector``, and ``KdTree``. Beside the property ``point``, a
``KdNode`` can also store an arbitrary pointer as ``data``. See
the header file *kdtree.hpp* for details.

Here is a usage example for a nearest neighbor search:

.. code:: CPP

   // set points for building the tree
   KdNodeVector nodes;
   double points[][2] = {
     {1,4}, {2,4}, {1,5}, {3,6}, {8,9},
     {2,7}, {4,4}, {5,5}, {4,6}, {8,3},
     {-20,-20} // array terminator
   };
   size_t i;
   for (i=0; points[i][0]>=-1.0; i++) {
     CoordPoint p;
     p.push_back(points[i][0]);
     p.push_back(points[i][1]);
     nodes.push_back(KdNode(p));
   }

   // build the tree
   KdTree tree(&nodes);

   // find the three nearest neighbors to (5,6)
   KdNodeVector neighbors;
   CoordPoint point(2);
   point[0] = 5;
   point[1] = 6;
   tree.k_nearest_neighbors(point, 3, &neighbors);

If you want to restrict the returned neighbors to only those fulfilling
a specific search predicate, you can define your search predicate as
a callable class (aka *functor*, i.e. a class with the call operator
``operator()`` overwritten [Stroustrup1997]_) derived from 
``KdNodePredicate`` as in the following example:

.. code:: CPP

   // only search for nodes on the right hand side of search point
   struct MyPredicate : public KdNodePredicate {
     CoordPoint point(2);
     MyPredicate(CoordPoint& p) {
       point = p;
     }
     bool operator()(const KdNode& kn) const {
       return point[0] < kn.point[0];
     }
   };

The call of the class only takes a single argument, the ``KdNode``, and
returns true when this node is an admissable search result. If this
criterion depends on some information from the actual search point (like
its *x*-component in the exampel above), this information must be passed to
the class constructor and stored within the class.

An instance of this class can then be passed as the third, optional argument
to ``KdTree.k_nearest_neighbors``:

.. code:: CPP

   KdNodeVector neighbors;
   CoordPoint point(2);
   point[0] = 5;
   point[1] = 6;
   MyPredicate predicate(point);
   tree.k_nearest_neighbors(point, 3, &neighbors, &predicate);


References
----------

.. [deBerg2000] M. de Berg, M. van Kreveld, M. Overmars, O. Schwarzkopf:
   *Computational Geometry.* Second edition, Springer (2000)

.. [Bentley1975] J.L. Bentley: *Multidimensional Binary Search Trees Used
   for Associative Searching.* Communications of the ACM 18,
   pp. 509-517 (1975)

.. [Friedman1977] J.H. Friedman, J.L. Bentley, R.A. Finkel:
   *An Algorithm for Finding Best Matches in Logarithmic Expected Time.*
   ACM Transcations on Mathematical Software 3, pp. 209-226 (1977)

.. [Stroustrup1997] Stroustrup, B. 1997. *The C++ Programming
   Language: Third Edition.*  Reading, MA: Addison-Wesley.

.. [Dalitz09] C. Dalitz: `Kd-Trees for Document Layout Analysis.`__
  In C. Dalitz (Ed.): "Document Image Analysis with the Gamera Framework."
  Schriftenreihe des Fachbereichs Elektrotechnik und Informatik,
  Hochschule Niederrhein, vol. 8, pp. 39-52, Shaker Verlag (2009)

.. __: http://lionel.kr.hsnr.de/~dalitz/data/publications/sr09-kdtree-layout.pdf