File: querylang.rst

package info (click to toggle)
python-whoosh 2.7.4%2Bgit6-g9134ad92-4
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 3,648 kB
  • sloc: python: 38,517; makefile: 118
file content (191 lines) | stat: -rw-r--r-- 4,911 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
==========================
The default query language
==========================

.. highlight:: none

Overview
========

A query consists of *terms* and *operators*. There are two types of terms: single
terms and *phrases*. Multiple terms can be combined with operators such as
*AND* and *OR*.

Whoosh supports indexing text in different *fields*. You must specify the
*default field* when you create the :class:`whoosh.qparser.QueryParser` object.
This is the field in which any terms the user does not explicitly specify a field
for will be searched.

Whoosh's query parser is capable of parsing different and/or additional syntax
through the use of plug-ins. See :doc:`parsing`.


Individual terms and phrases
============================

Find documents containing the term ``render``::

    render

Find documents containing the phrase ``all was well``::

    "all was well"

Note that a field must store Position information for phrase searching to work in
that field.

Normally when you specify a phrase, the maximum difference in position between
each word in the phrase is 1 (that is, the words must be right next to each
other in the document). For example, the following matches if a document has
``library`` within 5 words after ``whoosh``::

    "whoosh library"~5


Boolean operators
=================

Find documents containing ``render`` *and* ``shading``::

    render AND shading

Note that AND is the default relation between terms, so this is the same as::

    render shading

Find documents containing ``render``, *and* also either ``shading`` *or*
``modeling``::

    render AND shading OR modeling

Find documents containing ``render`` but *not* modeling::

    render NOT modeling

Find documents containing ``alpha`` but not either ``beta`` or ``gamma``::

    alpha NOT (beta OR gamma)

Note that when no boolean operator is specified between terms, the parser will
insert one, by default AND. So this query::

    render shading modeling

is equivalent (by default) to::

    render AND shading AND modeling

See :doc:`customizing the default parser <parsing>` for information on how to
change the default operator to OR.

Group operators together with parentheses. For example to find documents that
contain both ``render`` and ``shading``, or contain ``modeling``::

    (render AND shading) OR modeling


Fields
======

Find the term ``ivan`` in the ``name`` field::

    name:ivan

The ``field:`` prefix only sets the field for the term it directly precedes, so
the query::

    title:open sesame

Will search for ``open`` in the ``title`` field and ``sesame`` in the *default*
field.

To apply a field prefix to multiple terms, group them with parentheses::

    title:(open sesame)

This is the same as::

    title:open title:sesame

Of course you can specify a field for phrases too::

    title:"open sesame"


Inexact terms
=============

Use "globs" (wildcard expressions using ``?`` to represent a single character
and ``*`` to represent any number of characters) to match terms::

    te?t test* *b?g*

Note that a wildcard starting with ``?`` or ``*`` is very slow. Note also that
these wildcards only match *individual terms*. For example, the query::

    my*life

will **not** match an indexed phrase like::

    my so called life

because those are four separate terms.


Ranges
======

You can match a range of terms. For example, the following query will match
documents containing terms in the lexical range from ``apple`` to ``bear``
*inclusive*. For example, it will match documents containing ``azores`` and
``be`` but not ``blur``::

    [apple TO bear]

This is very useful when you've stored, for example, dates in a lexically sorted
format (i.e. YYYYMMDD)::

    date:[20050101 TO 20090715]

The range is normally *inclusive* (that is, the range will match all terms
between the start and end term, *as well as* the start and end terms
themselves). You can specify that one or both ends of the range are *exclusive*
by using the ``{`` and/or ``}`` characters::

    [0000 TO 0025}
    {prefix TO suffix}

You can also specify *open-ended* ranges by leaving out the start or end term::

    [0025 TO]
    {TO suffix}


Boosting query elements
=======================

You can specify that certain parts of a query are more important for calculating
the score of a matched document than others. For example, to specify that
``ninja`` is twice as important as other words, and ``bear`` is half as
important::

    ninja^2 cowboy bear^0.5

You can apply a boost to several terms using grouping parentheses::

    (open sesame)^2.5 roc


Making a term from literal text
===============================

If you need to include characters in a term that are normally treated specially
by the parser, such as spaces, colons, or brackets, you can enclose the term
in single quotes::

    path:'MacHD:My Documents'
    'term with spaces'
    title:'function()'