File: snippet.rst

package info (click to toggle)
groonga 15.0.4%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 163,080 kB
  • sloc: ansic: 770,564; cpp: 48,925; ruby: 40,447; javascript: 10,250; yacc: 7,045; sh: 5,602; python: 2,821; makefile: 1,672
file content (254 lines) | stat: -rw-r--r-- 8,088 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
.. -*- rst -*-

.. groonga-command
.. database: functions_snippet

``snippet``
===========

Summary
-------

This function extracts snippets of target text around search
keywords (``KWIC``. ``KeyWord In Context``).

If you want to use this function for normal Web application,
:doc:`snippet_html` may be suitable. It's a HTML specific version of
this function.

Syntax
------

``snippet`` requires at least one parameter that is the snippet target
text::

  snippet(column, ...)

You can specify one ore more tuples of keyword, open tag and close tag::

  snippet(column,
          "keyword1", "open-tag1", "close-tag1",
          "keyword2", "open-tag2", "close-tag2",
          ...)

If you specify default open tag and default close tag, you can specify
only keywords::

  snippet(column,
          "keyword1",
          "keyword2",
          ...,
          {
            "default_open_tag": "open-tag",
            "default_close_tag": "close-tag"
          })

.. versionadded:: 11.0.9

   If you specify default open tag and default close tag and omit
   keywords, keywords are extracted from the current condition
   automatically like :doc:`snippet_html`::

     snippet(column,
             {
               "default_open_tag": "open-tag",
               "default_close_tag": "close-tag"
             })

You can specify options as the last argument with all syntaxes::

  snippet(column,
          ...,
          {
            "width": 200,
            "max_n_results": 3,
            "skip_leading_spaces": true,
            "html_escape": false,
            "prefix": null,
            "suffix": null,
            "normalizer": null,
            "default_open_tag": null,
            "default_close_tag": null,
            "default": null,
            "delimiter_pattern": null,
          })

Usage
-----

Here are a schema definition and sample data to show usage.

.. groonga-command
.. include:: ../../example/reference/functions/snippet/usage_setup.log
.. table_create Documents TABLE_NO_KEY
.. column_create Documents content COLUMN_SCALAR Text
.. table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram  --normalizer NormalizerAuto
.. column_create Terms documents_content_index COLUMN_INDEX|WITH_POSITION Documents content
.. load --table Documents
.. [
.. ["content"],
.. ["Groonga is a fast and accurate full text search engine based on inverted index. One of the characteristics of groonga is that a newly registered document instantly appears in search results. Also, groonga allows updates without read locks. These characteristics result in superior performance on real-time applications."],
.. ["Groonga is also a column-oriented database management system (DBMS). Compared with well-known row-oriented systems, such as MySQL and PostgreSQL, column-oriented systems are more suited for aggregate queries. Due to this advantage, groonga can cover weakness of row-oriented systems."]
.. ]

``snippet`` extracts keywords from conditions specified in ``--query``
and/or ``--filter`` automatically when you specify
``default_open_tag`` option and ``default_close_tag`` and don't
specify keywords. It's similar to :doc:`snippet_html`.

The following example uses ``--query "fast performance"``. In this
case, ``fast`` and ``performance`` are used as keywords.

.. groonga-command
.. include:: ../../example/reference/functions/snippet/usage_keywords_from_conditions.log
.. select Documents \
..   --output_columns 'snippet(content, \
..                             { \
..                                "default_open_tag": "[", \
..                                "default_close_tag": "]" \
..                             })' \
..   --match_columns content \
..   --query "fast performance"

``--query "fast performance"`` matches to only the first record's
content. This ``snippet`` extracts two text parts that include the
keywords ``fast`` or ``performance`` and surrounds the keywords with
``[`` and ``]``.

The max number of text parts is 3 by default. You can change it by
``max_n_results`` option:

.. groonga-command
.. include:: ../../example/reference/functions/snippet/usage_max_n_results.log
.. select Documents \
..   --output_columns 'snippet(content, \
..                             { \
..                                "default_open_tag": "[", \
..                                "default_close_tag": "]", \
..                                "max_n_results": 1 \
..                             })' \
..   --match_columns content \
..   --query "fast performance"

It returns only one snippet because ``"max_n_results": 1`` is specified.

The max size of a text part is 200byte by default. The unit is bytes
not characters. The size doesn't include inserted ``[`` and ``[``. You
can change it by ``width`` option:

.. groonga-command
.. include:: ../../example/reference/functions/snippet/usage_width.log
.. select Documents \
..   --output_columns 'snippet(content, \
..                             { \
..                                "default_open_tag": "[", \
..                                "default_close_tag": "]", \
..                                "width": 50 \
..                             })' \
..   --match_columns content \
..   --query "fast performance"

You can detect snippet delimiter with regular expression by
``delimiter_regexp`` option. You can use ``\.\s*`` to use only text in
the target sentence. Note that you need to escape ``\`` in string:

.. groonga-command
.. include:: ../../example/reference/functions/snippet/usage_delimiter_regexp.log
.. select Documents \
..   --output_columns 'snippet(content, \
..                             { \
..                                "default_open_tag": "[", \
..                                "default_close_tag": "]", \
..                                "delimiter_regexp": "\\\\.\\\\s*" \
..                             })' \
..   --match_columns content \
..   --query "fast performance"

You can see the detected delimiters (``.`` and following white spaces)
aren't included in the result snippets. This is intentional behavior.

You can specify keywords explicitly instead of extracting keywords
from the current condition:

.. groonga-command
.. include:: ../../example/reference/functions/snippet/usage_keywords.log
.. select Documents \
..   --output_columns 'snippet(content, \
..                             "fast", \
..                             "performance", \
..                             { \
..                                "default_open_tag": "[", \
..                                "default_close_tag": "]" \
..                             })'

This ``snippet`` returns two snippets for the first record and
``null`` for the second record. Because the second record doesn't have
any specified keywords.

You can specify open tag and close tag for each keyword:

.. groonga-command
.. include:: ../../example/reference/functions/snippet/usage_tags.log
.. select Documents \
..   --output_columns 'snippet(content, \
..                             "fast", "[", "]", \
..                             "performance", "(", ")")'

This ``snippet`` surrounds ``fast`` with ``[`` and ``]]`` and
``performance`` with ``(`` and ``)``.

TODO: ``html_escape`` option and so on

Parameters
----------

Required parameters
^^^^^^^^^^^^^^^^^^^

TODO

Optional parameters
^^^^^^^^^^^^^^^^^^^

TODO

.. _snippet-max-n-results:

``max_n_results``
"""""""""""""""""

TODO

.. _snippet-width:

``width``
"""""""""

TODO

Return value
------------

This function returns an array of string or ``null``. If This function
can't find any snippets, it returns ``null``.

An element of array is a snippet::

  [SNIPPET1, SNIPPET2, ...]

A snippet includes one or more keywords. The max byte size of a
snippet except open tag and close tag is 200byte. The unit isn't the
number of characters.

You can change this by :ref:`snippet-width` option.

The array size is larger than or equal to 1 and less than or equal
to 3.

You can change this by :ref:`snippet-max-n-results` option.

See also
--------

* :doc:`snippet_html`
* :doc:`/reference/commands/select`