File: indexing.rst

package info (click to toggle)
groonga 15.0.4%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 163,080 kB
  • sloc: ansic: 770,564; cpp: 48,925; ruby: 40,447; javascript: 10,250; yacc: 7,045; sh: 5,602; python: 2,821; makefile: 1,672
file content (117 lines) | stat: -rw-r--r-- 3,582 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
.. -*- rst -*-

.. groonga-command
.. database: indexing

Indexing
========

Groonga supports both online index construction and offline
index construction since 2.0.0.

.. _online-index-construction:

Online index construction
-------------------------

In online index construction, registered documents can be
searchable quickly while indexing. But indexing requires
more cost rather than indexing by offline index
construction.

Online index construction is suitable for a search system
that values freshness. For example, a search system for
tweets, news, blog posts and so on will value
freshness. Online index construction can make fresh
documents searchable and keep searchable while indexing.

.. _offline-index-construction:

Offline index construction
--------------------------

In offline index construction, indexing cost is less than
indexing cost by online index construction. Indexing time
will be shorter. Index will be smaller. Resources required
for indexing will be smaller. But a registering document
cannot be searchable until all registered documents are
indexed.

Offline index construction is suitable for a search system
that values less required resources. If a search system
doesn't value freshness, offline index construction will be
suitable. For example, a reference manual search system
doesn't value freshness because a reference manual will be
updated only at a release.

How to use
----------

Groonga uses online index construction by default. We
register a document, we can search it quickly.

Groonga uses offline index construction by adding an index
to a column that already has data.

We define a schema:

.. groonga-command
.. include:: ../example/reference/indexing-schema.log
.. table_create Tweets TABLE_NO_KEY
.. column_create Tweets content COLUMN_SCALAR ShortText
.. table_create Lexicon TABLE_HASH_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto

We register data:

.. groonga-command
.. include:: ../example/reference/indexing-data.log
.. load --table Tweets
.. [
.. {"content":"Hello!"},
.. {"content":"I just start it!"},
.. {"content":"I'm sleepy... Have a nice day... Good night..."}
.. ]

We can search with sequential search when we don't have index:

.. groonga-command
.. include:: ../example/reference/indexing-search-without-index.log
.. select Tweets --match_columns content --query 'good nice'

We create index for ``Tweets.content``. Already registered
data in ``Tweets.content`` are indexed by offline index
construction:

.. groonga-command
.. include:: ../example/reference/indexing-offline-index-construction.log
.. column_create Lexicon tweet COLUMN_INDEX|WITH_POSITION Tweets content

.. tip::

   We can create an index in parallel by adding
   ``--n_workers -1``. For example::

     column_create Lexicon tweet COLUMN_INDEX|WITH_POSITION Tweets content --n_workers -1

We search with index. We get a matched record:

.. groonga-command
.. include:: ../example/reference/indexing-search-after-offline-index-construction.log
.. select Tweets --match_columns content --query 'good nice'

We register data again. They are indexed by online index
construction:

.. groonga-command
.. include:: ../example/reference/indexing-online-index-construction.log
.. load --table Tweets
.. [
.. {"content":"Good morning! Nice day."},
.. {"content":"Let's go shopping."}
.. ]

We can also get newly registered records by searching:

.. groonga-command
.. include:: ../example/reference/indexing-search-after-online-index-construction.log
.. select Tweets --match_columns content --query 'good nice'