File: facets.rst

package info (click to toggle)
xapian-core 1.4.3-2%2Bdeb9u3
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 21,412 kB
  • sloc: cpp: 113,868; ansic: 8,723; sh: 4,433; perl: 836; makefile: 566; tcl: 317; python: 40
file content (97 lines) | stat: -rw-r--r-- 3,726 bytes parent folder | download | duplicates (10)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97

.. Copyright (C) 2007,2010,2011 Olly Betts
.. Copyright (C) 2009 Lemur Consulting Ltd
.. Copyright (C) 2011 Richard Boulton

=======================
Xapian Faceting Support
=======================

.. contents:: Table of contents

Introduction
============

Xapian provides functionality which allows you to dynamically generate complete
lists of category values which feature in matching documents.  There are
numerous potential uses this can be put to, but a common one is to offer the
user the ability to narrow down their search by filtering it to only include
documents with a particular value of a particular category.  This is often
referred to as ``faceted search``.

You may have many multiple facets (for example colour, manufacturer, product
type) so Xapian allows you to handle multiple facets at once.

How to use Faceting
===================

Indexing
--------

When indexing a document, you need to add each facet in a different numbered
value slot.  As described elsewhere in the documentation, each Xapian document
has a set of "value slots", each of which is addressed by a number, and can
contain a value which is an arbitrary string.

The ``Xapian::Document::add_value()`` method can be used to put values into a
particular slot.  So, if you had a database of books, you might put "price"
facet values in slot 0, say (serialised to strings using
``Xapian::sortable_serialise``, or some similar function), "author" facet
values in slot 1, "publisher" facet values in slot 2 and "publication type"
(eg, hardback, softback, etc) values in slot 3.

Searching
---------

Finding Facets
~~~~~~~~~~~~~~

At search time, for each facet you want to consider, you need to get a count of
the number of times each facet value occurs in each slot; for the example
above, if you wanted to get facets for "price", "author" and "publication type"
you'd want to get the counts from slots 0, 1 and 3.

This can be done by calling ``Xapian::Enquire::add_matchspy()`` with a pointer
to a ``Xapian::ValueCountMatchSpy`` object for each value slot you want to
get facet counts for, like so::

    Xapian::ValueCountMatchSpy spy0(0);
    Xapian::ValueCountMatchSpy spy1(1);
    Xapian::ValueCountMatchSpy spy3(3);

    Xapian::Enquire enq(db);
    enq.add_matchspy(&spy0);
    enq.add_matchspy(&spy1);
    enq.add_matchspy(&spy3);

    enq.set_query(query);

    Xapian::MSet mset = enq.get_mset(0, 10, 10000);

The ``10000`` in the call to ``get_mset()`` tells Xapian to check at least
10000 documents, so the MatchSpy objects will be passed at least 10000
documents to tally facet information from (unless fewer than 10000 documents
match the query, in which case they will see all of them).  Setting this to
``db.get_doccount()`` will make the facet counts exact, but Xapian will have to
do more work for most queries so searches will be slower.

The ``spy`` objects now contain the facet information.  You can find out how
many documents they looked at by calling ``spy0.get_total()``.  (All the spies
will have looked at the same number of documents.)  You can read the values
from, say, ``spy0`` like this::

    Xapian::TermIterator i;
    for (i = spy0.values_begin(); i != spy0.values_end(); ++i) {
        cout << *i << ": " << i.get_termfreq() << endl;
    }

Restricting by Facet Values
~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you're using the facets to offer the user choices for narrowing down
their search results, you then need to be able to apply a suitable filter.

For a single value, you could use ``Xapian::Query::OP_VALUE_RANGE`` with the
same start and end, or ``Xapian::MatchDecider``, but it's probably most
efficient to also index the categories as suitably prefixed boolean terms and
use those for filtering.