File: index.rst

package info (click to toggle)
seqan2 2.5.2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 228,748 kB
  • sloc: cpp: 257,602; ansic: 91,967; python: 8,326; sh: 1,056; xml: 570; makefile: 229; awk: 51; javascript: 21
file content (77 lines) | stat: -rw-r--r-- 2,632 bytes parent folder | download | duplicates (9)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
.. We create this roles for putting the "Introduction: etc. headings
    on this page without them displaying in the ToC.  This would break
    rendering the ToC correctly on readthedocs style.  The rubric
    headings are formatted using CSS.

.. role:: rubric-heading1
    :class: rubric-heading1
.. role:: rubric-heading2
    :class: rubric-heading2

.. _tutorial-datastructures-indices:

Indices
=======

.. toctree::
    :hidden:
    :titlesonly:

    FMIndex
    QgramIndex
    StringIndices
    IndexIterators

Indices in SeqAn allow efficient pattern queries in strings or sets of
strings. In contrast to, e.g., online-search algorithms that search
through the text in :math:`\mathcal{O}(n)`, substring indices find a
pattern in sublinear time :math:`o(n)`.

Indices store additional data structures that allow searching the text
using an iterator. Using the iterator can be thought of as traversing
a suffix tree. The following section gives you an introduction how
the suffix tree is built.

.. tip::

   The :dox:`Finder` interface allows searching indices without using the iterator.
   For more information check out the tutorial on :ref:`tutorial-algorithms-pattern-matching-indexed`.

Suffix Trees
------------

We consider an alphabet Σ and a sentinel character $ that is smaller
than every character of Σ. A suffix tree of a given non-empty string s
over Σ is a directed tree whose edges are labeled with non-empty
substrings of s$ with the following properties:

1. Each outgoing edge begins with a different letter and the outdegree
of an internal node is greater than 1.
2. Each suffix of s$ is the concatenation of edges from the root to a leaf node.
3. Each path from the root to a leaf node is a suffix of s$.

The following figure shows the suffix tree of the string s="mississippi" (suffix nodes are shaded):


.. figure:: streeSentinel.png
    :align: center

**Figure 1:** Suffix tree of "mississippi"

Many suffix tree construction algorithms expect $ to be part of the
string alphabet which is undesirable for small bit-compressible
alphabets (e.g. DNA). In SeqAn there is no need to introduce a $. We
relax suffix tree criterion 2. and consider the relaxed suffix tree that
arises from the suffix tree of s by removing the $ character and all
empty edges. In the following, we only consider relaxed suffix trees and
simply call them suffix trees. In that tree a suffix can end in an inner
node as you can see in the next figure (suffix "i"):

.. figure:: streeNoSentinel.png
    :align: center

**Figure 2:** Relaxed suffix tree of "mississippi"

.. raw:: mediawiki

    {{TracNotice|{{PAGENAME}}}}