File: sample-omit-degen.rst

package info (click to toggle)
python-cogent 2024.5.7a1%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 74,600 kB
  • sloc: python: 92,479; makefile: 117; sh: 16
file content (58 lines) | stat: -rw-r--r-- 1,842 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
Removing degenerate characters
------------------------------

Degenerate IUPAC base symbols represent a site position that can have multiple possible characters. For a DNA example, "Y" represents pyrimidines where the site can be either "C" or "T".

.. note:: In many molecular evolutionary and phylogenetic analyses, the gap character "-" is treated "N", meaning any base.

Let's create sample data with degenerate characters

.. jupyter-execute::
    :raises:

    from cogent3 import make_aligned_seqs

    aln = make_aligned_seqs({"s1": "ACGA-GACG", "s2": "GATGATGYT"}, moltype="dna")
    aln

Omit aligned columns containing a degenerate character
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::
    :raises:

    from cogent3 import get_app

    omit_degens = get_app("omit_degenerates", moltype="dna")
    result = omit_degens(aln)
    result


Omit all degenerate characters except gaps from an alignment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If we create the app with the argument ``gap_is_degen=False``, we can omit degenerate characters but retain gaps. 

.. jupyter-execute::
    :raises:

    from cogent3 import get_app

    omit_degens_keep_gaps = get_app("omit_degenerates", moltype="dna", gap_is_degen=False)
    result = omit_degens_keep_gaps(aln)
    result

Omit k-mers which contain degenerate characters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If we create ``omit_degenerates`` with the argument ``motif_length``, it will split sequences into non-overlapping tuples of the specified length and exclude any tuple that contains a degenerate character. 

.. jupyter-execute::
    :raises:

    from cogent3 import get_app

    omit_degenerates_app = get_app("omit_degenerates", moltype="dna", motif_length=2)
    result = omit_degenerates_app(aln)

    result