File: search.txt

package info (click to toggle)
python-django 1%3A1.11.29-1~deb10u1
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 47,428 kB
  • sloc: python: 220,776; javascript: 13,523; makefile: 209; xml: 201; sh: 64
file content (241 lines) | stat: -rw-r--r-- 9,286 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
================
Full text search
================

.. versionadded:: 1.10

The database functions in the ``django.contrib.postgres.search`` module ease
the use of PostgreSQL's `full text search engine
<https://www.postgresql.org/docs/current/static/textsearch.html>`_.

For the examples in this document, we'll use the models defined in
:doc:`/topics/db/queries`.

.. seealso::

    For a high-level overview of searching, see the :doc:`topic documentation
    </topics/db/search>`.

.. currentmodule:: django.contrib.postgres.search

The ``search`` lookup
=====================

.. fieldlookup:: search

The simplest way to use full text search is to search a single term against a
single column in the database. For example::

    >>> Entry.objects.filter(body_text__search='Cheese')
    [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]

This creates a ``to_tsvector`` in the database from the ``body_text`` field
and a ``plainto_tsquery`` from the search term ``'Cheese'``, both using the
default database search configuration. The results are obtained by matching the
query and the vector.

To use the ``search`` lookup, ``'django.contrib.postgres'`` must be in your
:setting:`INSTALLED_APPS`.

``SearchVector``
================

.. class:: SearchVector(\*expressions, config=None, weight=None)

Searching against a single field is great but rather limiting. The ``Entry``
instances we're searching belong to a ``Blog``, which has a ``tagline`` field.
To query against both fields, use a ``SearchVector``::

    >>> from django.contrib.postgres.search import SearchVector
    >>> Entry.objects.annotate(
    ...     search=SearchVector('body_text', 'blog__tagline'),
    ... ).filter(search='Cheese')
    [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]

The arguments to ``SearchVector`` can be any
:class:`~django.db.models.Expression` or the name of a field. Multiple
arguments will be concatenated together using a space so that the search
document includes them all.

``SearchVector`` objects can be combined together, allowing you to reuse them.
For example::

    >>> Entry.objects.annotate(
    ...     search=SearchVector('body_text') + SearchVector('blog__tagline'),
    ... ).filter(search='Cheese')
    [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]

See :ref:`postgresql-fts-search-configuration` and
:ref:`postgresql-fts-weighting-queries` for an explanation of the ``config``
and ``weight`` parameters.

``SearchQuery``
===============

.. class:: SearchQuery(value, config=None)

``SearchQuery`` translates the terms the user provides into a search query
object that the database compares to a search vector. By default, all the words
the user provides are passed through the stemming algorithms, and then it
looks for matches for all of the resulting terms.

``SearchQuery`` terms can be combined logically to provide more flexibility::

    >>> from django.contrib.postgres.search import SearchQuery
    >>> SearchQuery('potato') & SearchQuery('ireland')  # potato AND ireland
    >>> SearchQuery('potato') | SearchQuery('penguin')  # potato OR penguin
    >>> ~SearchQuery('sausage')  # NOT sausage

See :ref:`postgresql-fts-search-configuration` for an explanation of the
``config`` parameter.

``SearchRank``
==============

.. class:: SearchRank(vector, query, weights=None)

So far, we've just returned the results for which any match between the vector
and the query are possible. It's likely you may wish to order the results by
some sort of relevancy. PostgreSQL provides a ranking function which takes into
account how often the query terms appear in the document, how close together
the terms are in the document, and how important the part of the document is
where they occur. The better the match, the higher the value of the rank. To
order by relevancy::

    >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
    >>> vector = SearchVector('body_text')
    >>> query = SearchQuery('cheese')
    >>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
    [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]

See :ref:`postgresql-fts-weighting-queries` for an explanation of the
``weights`` parameter.

.. _postgresql-fts-search-configuration:

Changing the search configuration
=================================

You can specify the ``config`` attribute to a :class:`SearchVector` and
:class:`SearchQuery` to use a different search configuration. This allows using
a different language parsers and dictionaries as defined by the database::

    >>> from django.contrib.postgres.search import SearchQuery, SearchVector
    >>> Entry.objects.annotate(
    ...     search=SearchVector('body_text', config='french'),
    ... ).filter(search=SearchQuery('œuf', config='french'))
    [<Entry: Pain perdu>]

The value of ``config`` could also be stored in another column::

    >>> from django.db.models import F
    >>> Entry.objects.annotate(
    ...     search=SearchVector('body_text', config=F('blog__language')),
    ... ).filter(search=SearchQuery('œuf', config=F('blog__language')))
    [<Entry: Pain perdu>]

.. _postgresql-fts-weighting-queries:

Weighting queries
=================

Every field may not have the same relevance in a query, so you can set weights
of various vectors before you combine them::

    >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
    >>> vector = SearchVector('body_text', weight='A') + SearchVector('blog__tagline', weight='B')
    >>> query = SearchQuery('cheese')
    >>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by('rank')

The weight should be one of the following letters: D, C, B, A. By default,
these weights refer to the numbers ``0.1``, ``0.2``, ``0.4``, and ``1.0``,
respectively. If you wish to weight them differently, pass a list of four
floats to :class:`SearchRank` as ``weights`` in the same order above::

    >>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
    >>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by('-rank')

Performance
===========

Special database configuration isn't necessary to use any of these functions,
however, if you're searching more than a few hundred records, you're likely to
run into performance problems. Full text search is a more intensive process
than comparing the size of an integer, for example.

In the event that all the fields you're querying on are contained within one
particular model, you can create a functional index which matches the search
vector you wish to use. The PostgreSQL documentation has details on
`creating indexes for full text search
<https://www.postgresql.org/docs/current/static/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX>`_.

``SearchVectorField``
---------------------

.. class:: SearchVectorField

If this approach becomes too slow, you can add a ``SearchVectorField`` to your
model. You'll need to keep it populated with triggers, for example, as
described in the `PostgreSQL documentation`_. You can then query the field as
if it were an annotated ``SearchVector``::

    >>> Entry.objects.update(search_vector=SearchVector('body_text'))
    >>> Entry.objects.filter(search_vector='cheese')
    [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]

.. _PostgreSQL documentation: https://www.postgresql.org/docs/current/static/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS

Trigram similarity
==================

Another approach to searching is trigram similarity. A trigram is a group of
three consecutive characters. In addition to the :lookup:`trigram_similar`
lookup, you can use a couple of other expressions.

To use them, you need to activate the `pg_trgm extension
<https://www.postgresql.org/docs/current/static/pgtrgm.html>`_ on
PostgreSQL. You can install it using the
:class:`~django.contrib.postgres.operations.TrigramExtension` migration
operation.

``TrigramSimilarity``
---------------------

.. class:: TrigramSimilarity(expression, string, **extra)

.. versionadded:: 1.10

Accepts a field name or expression, and a string or expression. Returns the
trigram similarity between the two arguments.

Usage example::

    >>> from django.contrib.postgres.search import TrigramSimilarity
    >>> Author.objects.create(name='Katy Stevens')
    >>> Author.objects.create(name='Stephen Keats')
    >>> test = 'Katie Stephens'
    >>> Author.objects.annotate(
    ...     similarity=TrigramSimilarity('name', test),
    ... ).filter(similarity__gt=0.3).order_by('-similarity')
    [<Author: Katy Stevens>, <Author: Stephen Keats>]

``TrigramDistance``
-------------------

.. class:: TrigramDistance(expression, string, **extra)

.. versionadded:: 1.10

Accepts a field name or expression, and a string or expression. Returns the
trigram distance between the two arguments.

Usage example::

    >>> from django.contrib.postgres.search import TrigramDistance
    >>> Author.objects.create(name='Katy Stevens')
    >>> Author.objects.create(name='Stephen Keats')
    >>> test = 'Katie Stephens'
    >>> Author.objects.annotate(
    ...     distance=TrigramDistance('name', test),
    ... ).filter(distance__lte=0.7).order_by('distance')
    [<Author: Katy Stevens>, <Author: Stephen Keats>]