File: spatial.rst

package info (click to toggle)
django-haystack 3.3.0-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,504 kB
  • sloc: python: 23,475; xml: 1,708; sh: 74; makefile: 71
file content (414 lines) | stat: -rw-r--r-- 14,431 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
.. _ref-spatial:

==============
Spatial Search
==============

Spatial search (also called geospatial search) allows you to take data that
has a geographic location & enhance the search results by limiting them to a
physical area. Haystack, combined with the latest versions of a couple engines,
can provide this type of search.

In addition, Haystack tries to implement these features in a way that is as
close to GeoDjango_ as possible. There are some differences, which we'll
highlight throughout this guide. Additionally, while the support isn't as
comprehensive as PostGIS (for example), it is still quite useful.

.. _GeoDjango: https://docs.djangoproject.com/en/stable/ref/contrib/gis/


Additional Requirements
=======================

The spatial functionality has only one non-included, non-available-in-Django
dependency:

* ``geopy`` - ``pip install geopy``

If you do not ever need distance information, you may be able to skip
installing ``geopy``.


Support
=======

You need the latest & greatest of either Solr or Elasticsearch. None of the
other backends (specifically the engines) support this kind of search.

For Solr_, you'll need at least **v3.5+**. In addition, if you have an existing
install of Haystack & Solr, you'll need to upgrade the schema & reindex your
data. If you're adding geospatial data, you would have to reindex anyhow.

For Elasticsearch, you'll need at least v0.17.7, preferably v0.18.6 or better.
If you're adding geospatial data, you'll have to reindex as well.

.. _Solr: http://lucene.apache.org/solr/

====================== ====== =============== ======== ======== ======
Lookup Type            Solr   Elasticsearch   Whoosh   Xapian   Simple
====================== ====== =============== ======== ======== ======
`within`               X      X
`dwithin`              X      X
`distance`             X      X
`order_by('distance')` X      X
`polygon`                     X
====================== ====== =============== ======== ======== ======

For more details, you can inspect http://wiki.apache.org/solr/SpatialSearch
or http://www.elasticsearch.org/guide/reference/query-dsl/geo-bounding-box-filter.html.


Geospatial Assumptions
======================

``Points``
----------

Haystack prefers to work with ``Point`` objects, which are located in
``django.contrib.gis.geos.Point``.

``Point`` objects use **LONGITUDE, LATITUDE** for their construction, regardless
if you use the parameters to instantiate them or WKT_/``GEOSGeometry``.

.. _WKT: http://en.wikipedia.org/wiki/Well-known_text

Examples::

    # Using positional arguments.
    from django.contrib.gis.geos import Point
    pnt = Point(-95.23592948913574, 38.97127105172941)

    # Using WKT.
    from django.contrib.gis.geos import GEOSGeometry
    pnt = GEOSGeometry('POINT(-95.23592948913574 38.97127105172941)')

They are preferred over just providing ``latitude, longitude`` because they are
more intelligent, have a spatial reference system attached & are more consistent
with GeoDjango's use.


``Distance``
------------

Haystack also uses the ``D`` (or ``Distance``) objects from GeoDjango,
implemented in ``django.contrib.gis.measure.Distance``.

``Distance`` objects accept a very flexible set of measurements during
instantiaton and can convert amongst them freely. This is important, because
the engines rely on measurements being in kilometers but you're free to use
whatever units you want.

Examples::

    from django.contrib.gis.measure import D

    # Start at 5 miles.
    imperial_d = D(mi=5)

    # Convert to fathoms...
    fathom_d = imperial_d.fathom

    # Now to kilometers...
    km_d = imperial_d.km

    # And back to miles.
    mi = imperial_d.mi

They are preferred over just providing a raw distance because they are
more intelligent, have a well-defined unit system attached & are consistent
with GeoDjango's use.


``WGS-84``
----------

All engines assume WGS-84 (SRID 4326). At the time of writing, there does **not**
appear to be a way to switch this. Haystack will transform all points into this
coordinate system for you.


Indexing
========

Indexing is relatively simple. Simply add a ``LocationField`` (or several)
onto your ``SearchIndex`` class(es) & provide them a ``Point`` object. For
example::

    from haystack import indexes
    from shops.models import Shop


    class ShopIndex(indexes.SearchIndex, indexes.Indexable):
        text = indexes.CharField(document=True, use_template=True)
        # ... the usual, then...
        location = indexes.LocationField(model_attr='coordinates')

        def get_model(self):
            return Shop

If you must manually prepare the data, you have to do something slightly less
convenient, returning a string-ified version of the coordinates in WGS-84 as
``lat,long``::

    from haystack import indexes
    from shops.models import Shop


    class ShopIndex(indexes.SearchIndex, indexes.Indexable):
        text = indexes.CharField(document=True, use_template=True)
        # ... the usual, then...
        location = indexes.LocationField()

        def get_model(self):
            return Shop

        def prepare_location(self, obj):
            # If you're just storing the floats...
            return "%s,%s" % (obj.latitude, obj.longitude)

Alternatively, you could build a method/property onto the ``Shop`` model that
returns a ``Point`` based on those coordinates::

    # shops/models.py
    from django.contrib.gis.geos import Point
    from django.db import models


    class Shop(models.Model):
        # ... the usual, then...
        latitude = models.FloatField()
        longitude = models.FloatField()

        # Usual methods, then...
        def get_location(self):
            # Remember, longitude FIRST!
            return Point(self.longitude, self.latitude)


    # shops/search_indexes.py
    from haystack import indexes
    from shops.models import Shop


    class ShopIndex(indexes.SearchIndex, indexes.Indexable):
        text = indexes.CharField(document=True, use_template=True)
        location = indexes.LocationField(model_attr='get_location')

        def get_model(self):
            return Shop


Querying
========

There are two types of geospatial queries you can run, ``within`` & ``dwithin``.
Like their GeoDjango counterparts (within_ & dwithin_), these methods focus on
finding results within an area.

.. _within: https://docs.djangoproject.com/en/dev/ref/contrib/gis/geoquerysets/#within
.. _dwithin: https://docs.djangoproject.com/en/dev/ref/contrib/gis/geoquerysets/#dwithin


``within``
----------

.. method:: SearchQuerySet.within(self, field, point_1, point_2)

``within`` is a bounding box comparison. A bounding box is a rectangular area
within which to search. It's composed of a bottom-left point & a top-right
point. It is faster but slighty sloppier than its counterpart.

Examples::

    from haystack.query import SearchQuerySet
    from django.contrib.gis.geos import Point

    downtown_bottom_left = Point(-95.23947, 38.9637903)
    downtown_top_right = Point(-95.23362278938293, 38.973081081164715)

    # 'location' is the fieldname from our ``SearchIndex``...

    # Do the bounding box query.
    sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right)

    # Can be chained with other Haystack calls.
    sqs = SearchQuerySet().auto_query('coffee').within('location', downtown_bottom_left, downtown_top_right).order_by('-popularity')

.. note::

    In GeoDjango, assuming the ``Shop`` model had been properly geo-ified, this
    would have been implemented as::

        from shops.models import Shop
        Shop.objects.filter(location__within=(downtown_bottom_left, downtown_top_right))

    Haystack's form differs because it yielded a cleaner implementation, was
    no more typing than the GeoDjango version & tried to maintain the same
    terminology/similar signature.


``dwithin``
-----------

.. method:: SearchQuerySet.dwithin(self, field, point, distance)

``dwithin`` is a radius-based search. A radius-based search is a circular area
within which to search. It's composed of a center point & a radius (in
kilometers, though Haystack will use the ``D`` object's conversion utilities to
get it there). It is slower than``within`` but very exact & can involve fewer
calculations on your part.

Examples::

    from haystack.query import SearchQuerySet
    from django.contrib.gis.geos import Point
    from django.contrib.gis.measure import D

    ninth_and_mass = Point(-95.23592948913574, 38.96753407043678)
    # Within a two miles.
    max_dist = D(mi=2)

    # 'location' is the fieldname from our ``SearchIndex``...

    # Do the radius query.
    sqs = SearchQuerySet().dwithin('location', ninth_and_mass, max_dist)

    # Can be chained with other Haystack calls.
    sqs = SearchQuerySet().auto_query('coffee').dwithin('location', ninth_and_mass, max_dist).order_by('-popularity')

.. note::

    In GeoDjango, assuming the ``Shop`` model had been properly geo-ified, this
    would have been implemented as::

        from shops.models import Shop
        Shop.objects.filter(location__dwithin=(ninth_and_mass, D(mi=2)))

    Haystack's form differs because it yielded a cleaner implementation, was
    no more typing than the GeoDjango version & tried to maintain the same
    terminology/similar signature.


``distance``
------------

.. method:: SearchQuerySet.distance(self, field, point)

By default, search results will come back without distance information attached
to them. In the concept of a bounding box, it would be ambiguous what the
distances would be calculated against. And it is more calculation that may not
be necessary.

So like GeoDjango, Haystack exposes a method to signify that you want to
include these calculated distances on results.

Examples::

    from haystack.query import SearchQuerySet
    from django.contrib.gis.geos import Point
    from django.contrib.gis.measure import D

    ninth_and_mass = Point(-95.23592948913574, 38.96753407043678)

    # On a bounding box...
    downtown_bottom_left = Point(-95.23947, 38.9637903)
    downtown_top_right = Point(-95.23362278938293, 38.973081081164715)

    sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right).distance('location', ninth_and_mass)

    # ...Or on a radius query.
    sqs = SearchQuerySet().dwithin('location', ninth_and_mass, D(mi=2)).distance('location', ninth_and_mass)

You can even apply a different field, for instance if you calculate results of
key, well-cached hotspots in town but want distances from the user's current
position::

    from haystack.query import SearchQuerySet
    from django.contrib.gis.geos import Point
    from django.contrib.gis.measure import D

    ninth_and_mass = Point(-95.23592948913574, 38.96753407043678)
    user_loc = Point(-95.23455619812012, 38.97240128290697)

    sqs = SearchQuerySet().dwithin('location', ninth_and_mass, D(mi=2)).distance('location', user_loc)

.. note::

    The astute will notice this is Haystack's biggest departure from GeoDjango.
    In GeoDjango, this would have been implemented as::

        from shops.models import Shop
        Shop.objects.filter(location__dwithin=(ninth_and_mass, D(mi=2))).distance(user_loc)

    Note that, by default, the GeoDjango form leaves *out* the field to be
    calculating against (though it's possible to override it & specify the
    field).

    Haystack's form differs because the same assumptions are difficult to make.
    GeoDjango deals with a single model at a time, where Haystack deals with
    a broad mix of models. Additionally, accessing ``Model`` information is a
    couple hops away, so Haystack favors the explicit (if slightly more typing)
    approach.


Ordering
========

Because you're dealing with search, even with geospatial queries, results still
come back in **RELEVANCE** order. If you want to offer the user ordering
results by distance, there's a simple way to enable this ordering.

Using the standard Haystack ``order_by`` method, if you specify ``distance`` or
``-distance`` **ONLY**, you'll get geographic ordering. Additionally, you must
have a call to ``.distance()`` somewhere in the chain, otherwise there is no
distance information on the results & nothing to sort by.

Examples::

    from haystack.query import SearchQuerySet
    from django.contrib.gis.geos import Point
    from django.contrib.gis.measure import D

    ninth_and_mass = Point(-95.23592948913574, 38.96753407043678)
    downtown_bottom_left = Point(-95.23947, 38.9637903)
    downtown_top_right = Point(-95.23362278938293, 38.973081081164715)

    # Non-geo ordering.
    sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right).order_by('title')
    sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right).distance('location', ninth_and_mass).order_by('-created')

    # Geo ordering, closest to farthest.
    sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right).distance('location', ninth_and_mass).order_by('distance')
    # Geo ordering, farthest to closest.
    sqs = SearchQuerySet().dwithin('location', ninth_and_mass, D(mi=2)).distance('location', ninth_and_mass).order_by('-distance')

.. note::

    This call is identical to the GeoDjango usage.

.. warning::

    You can not specify both a distance & lexicographic ordering. If you specify
    more than just ``distance`` or ``-distance``, Haystack assumes ``distance``
    is a field in the index & tries to sort on it. Example::

        # May blow up!
        sqs = SearchQuerySet().dwithin('location', ninth_and_mass, D(mi=2)).distance('location', ninth_and_mass).order_by('distance', 'title')

    This is a limitation in the engine's implementation.

    If you actually **have** a field called ``distance`` (& aren't using
    calculated distance information), Haystack will do the right thing in
    these circumstances.


Caveats
=======

In all cases, you may call the ``within/dwithin/distance`` methods as many times
as you like. However, the **LAST** call is the information that will be used.
No combination logic is available, as this is largely a backend limitation.

Combining calls to both ``within`` & ``dwithin`` may yield unexpected or broken
results. They don't overlap when performing queries, so it may be possible to
construct queries that work. Your Mileage May Vary.