1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
|
# Quick start
First, you'll need a list of dicts representing the documents you want to search on. These documents must have a unique field which will serve as a reference and a series of fields you'd like to search on.
```python
>>> from lunr import lunr
>>>
>>> documents = [{
...: 'id': 'a',
...: 'title': 'Mr. Green kills Colonel Mustard',
...: 'body': """Mr. Green killed Colonel Mustard in the study with the
...: candlestick. Mr. Green is not a very nice fellow."""
...: }, {
...: 'id': 'b',
...: 'title': 'Plumb waters plant',
...: 'body': 'Professor Plumb has a green and a yellow plant in his study',
...: }, {
...: 'id': 'c',
...: 'title': 'Scarlett helps Professor',
...: 'body': """Miss Scarlett watered Professor Plumbs green plant
...: while he was away on his murdering holiday.""",
...: }]
```
Lunr provides a convenience `lunr` function to quickly index this set of documents:
```python
>>> idx = lunr(
... ref='id', fields=('title', 'body'), documents=documents
... )
```
For basic no-fuss searches just use the `search` on the index:
```python
>>> idx.search('kill')
[{'ref': 'a', 'score': 0.6931722372559913, 'match_data': <MatchData "kill">}]
>>> idx.search('study')
[{'ref': 'b', 'score': 0.23576799568081389, 'match_data': <MatchData "studi">},
{'ref': 'a', 'score': 0.2236629211724517, 'match_data': <MatchData "studi">}]
```
## Using query strings
The query string passed to `search` accepts multiple terms:
```python
>>> idx.search('green plant')
[{'ref': 'b', 'score': 0.5023294192217546, 'match_data': <MatchData "green, plant">},
{'ref': 'a', 'score': 0.12544083739725947, 'match_data': <MatchData "green">},
{'ref': 'c', 'score': 0.07306110905506158, 'match_data': <MatchData "green, plant">}]
```
The index will search for `green` OR `plant`, a few things to note on the results:
- document `b` scores highest because `plant` appears in both fields and `green` appears in the body
- document `a` is second includes only `green` but in the title and the body twice
- document `c` includes both terms but only on one of the fields
Query strings support a variety of modifiers:
### Wildcards
You can use `*` as a wildcard anywhere in your query string:
```python
>>> idx.search('pl*')
[{'ref': 'b', 'score': 0.725901569004226, 'match_data': <MatchData "plumb, plant">},
{'ref': 'c', 'score': 0.0816178155209697, 'match_data': <MatchData "plumb, plant">}]
>>> idx.search('*llow')
[{'ref': 'b', 'score': 0.6210112024848421, 'match_data': <MatchData "yellow">},
{'ref': 'a', 'score': 0.30426104537491444, 'match_data': <MatchData "fellow">}]
```
Note that, when using wildcards, no stemming is performed in the search terms.
### Fields
Prefixing any search term with `<FIELD_NAME>:` allows you to specify which field a particular term should be searched for:
```python
>>> idx.search('title:green title:plant')
[{'ref': 'b', 'score': 0.18604713274256787, 'match_data': <MatchData "plant">},
{'ref': 'a', 'score': 0.07902963505882092, 'match_data': <MatchData "green">}]
```
Note the difference with the example above, document `c` is no longer in the results.
Specifying an unindexed field will raise an exception:
```python
>>> idx.search('foo:green')
Traceback (most recent call last):
...
lunr.exceptions.QueryParseError: Unrecognized field "foo", possible fields title, body
```
You can combine this with wildcards:
```python
>>> idx.search('body:mu*')
[{'ref': 'c', 'score': 0.3072276611029057, 'match_data': <MatchData "murder">},
{'ref': 'a', 'score': 0.14581429988419872, 'match_data': <MatchData "mustard">}]
```
### Boosts
When searching for several terms you can use boosting to give more importance to the each term:
```python
>>> idx.search('green plant^10')
[{'ref': 'b', 'score': 0.831629678987025, 'match_data': <MatchData "green, plant">},
{'ref': 'c', 'score': 0.06360184858161157, 'match_data': <MatchData "green, plant">},
{'ref': 'a', 'score': 0.01756105367777591, 'match_data': <MatchData "green">}]
```
Note how document `c` now scores higher because of the boosting on the term `plant`. The `10` represents a multiplier on the relative score for the term and must be positive integers.
### Fuzzy matches
You can also use fuzzy matching for terms that are likely to be misspelled:
```python
>>> idx.search('yellow~1')
[{'ref': 'b', 'score': 0.621155860224936, 'match_data': <MatchData "yellow">},
{'ref': 'a', 'score': 0.3040972809936496, 'match_data': <MatchData "fellow">}]
```
The positive integer after `~` represents the edit distance, in this case 1 character, either by addition, removal or transposition.
### Term presence (new in 0.3.0)
As mentioned above, Lunr defaults to searching for logical OR on terms, but it is possible to specify the presence of each term in matching documents. The default OR behaviour is represented by the term's presence being *optional* in a matching document, to specify that a term must be present in matching document the term must be prefixed with a `+`. On the other hand to specify that a term must *not* be included in a matching document the term must be prefixed with a `-`.
The below example searches for documents that must contain "green", might contain "plant" and must not contain "study":
```python
>>> idx.search("+green plant -study")
[{'ref': 'c',
'score': 0.08090317236904906,
'match_data': <MatchData "green,plant">}]
```
Contrast this with the default behaviour:
```python
>>> idx.search('green plant study')
[{'ref': 'b',
'score': 0.5178296383103647,
'match_data': <MatchData "green,plant,studi">},
{'ref': 'a',
'score': 0.22147889214939157,
'match_data': <MatchData "green,studi">},
{'ref': 'c',
'score': 0.06605716362553504,
'match_data': <MatchData "green,plant">}]
```
To simulate a logical AND search of "green AND plant" mark both terms as required:
```python
>>> idx.search('+yellow +plant')
[{'ref': 'b',
'score': 0.8915374700737615,
'match_data': <MatchData "plant,yellow">}]
```
As opposed to the default:
```python
>>> idx.search('yellow plant')
[{'ref': 'b',
'score': 0.8915374700737615,
'match_data': <MatchData "plant,yellow">},
{'ref': 'c',
'score': 0.045333674172311975,
'match_data': <MatchData "plant">}]
```
Note presence can also be combined with any of the other modifiers described above.
|