File: Alphabets.rst

package info (click to toggle)
seqan2 2.5.2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 228,748 kB
  • sloc: cpp: 257,602; ansic: 91,967; python: 8,326; sh: 1,056; xml: 570; makefile: 229; awk: 51; javascript: 21
file content (112 lines) | stat: -rw-r--r-- 5,550 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
.. sidebar:: ToC

    .. contents::

.. _tutorial-datastructures-sequences-alphabets:

Alphabets
=========

Learning Objective
  You will learn the details about the alphabets in SeqAn.

Difficulty
  Basic

Duration
  15 min

Prerequisites
  :ref:`tutorial-getting-started-first-steps-in-seqan`

This tutorial will describe the different alphabets used in SeqAn, or in other words, you will learn about the contained types of a SeqAn :dox:`String`.
To continue with the other tutorials, it would be enough to know, that in SeqAn several standard alphabets are already predefined, e.g. :dox:`Dna`, :dox:`Dna5`, :dox:`Rna`, :dox:`Rna5`, :dox:`Iupac`, :dox:`AminoAcid`.

Types
-----

Any type that provides a default constructor, a copy constructor and an assignment operator can be used as the alphabet / contained type of a :dox:`String` (see also the tutorial :ref:`tutorial-datastructures-sequences`).
This includes the C++ `POD types <https://www.parashift.com/c++-faq-lite/intrinsic-types.html#faq-26.7>`_, e.g. ``char``, ``int``, ``double`` etc.
In addition you can use more complex types like :dox:`String` as the contained type of strings, e.g. ``String<String<char> >``.

SeqAn also provides the following types that are useful in bioinformatics.
Each of them is a specialization of the class :dox:`SimpleType`.

+------------------+-------------------------------------------------------------+
| Specialization   | Description                                                 |
+==================+=============================================================+
| :dox:`AminoAcid` | Amino Acid Alphabet                                         |
+------------------+-------------------------------------------------------------+
| :dox:`Dna`       | DNA alphabet                                                |
+------------------+-------------------------------------------------------------+
| :dox:`Dna5`      | ``N`` alphabet including ``N`` character                    |
+------------------+-------------------------------------------------------------+
| :dox:`DnaQ`      | ``N`` alphabet plus phred quality                           |
+------------------+-------------------------------------------------------------+
| :dox:`Dna5Q`     | ``N`` alphabet plus phred quality including ``N`` character |
+------------------+-------------------------------------------------------------+
| :dox:`Finite`    | Finite alphabet of fixed size.                              |
+------------------+-------------------------------------------------------------+
| :dox:`Iupac`     | ``N`` Iupac code.                                           |
+------------------+-------------------------------------------------------------+
| :dox:`Rna`       | ``N`` alphabet                                              |
+------------------+-------------------------------------------------------------+
| :dox:`Rna5`      | ``N`` alphabet including ``N`` character                    |
+------------------+-------------------------------------------------------------+

Functionality
-------------

In SeqAn, alphabets are value types that can take a limited number of values and which hence can be mapped to a range of natural numbers.
We can retrieve the number of different values of an alphabet, the alphabet size, by the metafunction :dox:`FiniteOrderedAlphabetConcept#ValueSize`.

.. includefrags:: demos/tutorial/alphabets/example_size.cpp
    :fragment: main

.. includefrags:: demos/tutorial/alphabets/example_size.cpp.stdout

Another useful metafunction called :dox:`AlphabetConcept#BitsPerValue` can be used to determine the number of bits needed to store a value of a given alphabet.

.. includefrags:: demos/tutorial/alphabets/example_bitsPerValue.cpp
    :fragment: main

.. includefrags:: demos/tutorial/alphabets/example_bitsPerValue.cpp.stdout

The order of a character in the alphabet (i.e. its corresponding natural number) can be retrieved by calling the function :dox:`FiniteOrderedAlphabetConcept#ordValue`.
See each specialization's documentation for the ordering of the alphabet's values.

.. includefrags:: demos/tutorial/alphabets/example_ordValue.cpp
    :fragment: main

.. includefrags:: demos/tutorial/alphabets/example_ordValue.cpp.stdout

.. tip::

    The return value of the :dox:`FiniteOrderedAlphabetConcept#ordValue` function is determined by the metafunction :dox:`FiniteOrderedAlphabetConcept#ValueSize`.
    :dox:`FiniteOrderedAlphabetConcept#ValueSize` returns the type which uses the least amount of memory while being able to represent all possible values.
    E.g. :dox:`FiniteOrderedAlphabetConcept#ValueSize` of :dox:`Dna` returns an ``_uint8`` which is able to represent 256 different characters.
    However, note that ``std::cout`` has no visible symbol for printing all values on the screen, hence a cast to ``unsigned`` might be necessary.

Assignment 1
^^^^^^^^^^^^

.. container:: assignment

   Type
     Application

   Objective
     In this task you will learn how to access all the letters of an alphabet.
     Use the piece of code from below and adjust the function ``showAllLettersOfMyAlphabet()`` to go through all the characters of the current alphabet and print them.

     .. includefrags:: demos/tutorial/alphabets/assignment_1.cpp

   Hints
     You will need the Metafunction :dox:`FiniteOrderedAlphabetConcept#ValueSize`.

   Solution
     Click **more...** to see the solution.

     .. container:: foldable

        .. includefrags:: demos/tutorial/alphabets/assignment_1_solution.cpp