File: sample-concat.rst

package info (click to toggle)
python-cogent 2024.5.7a1%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 74,600 kB
  • sloc: python: 92,479; makefile: 117; sh: 16
file content (67 lines) | stat: -rw-r--r-- 1,845 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
Concatenating alignments
------------------------

The ``concat`` app provides a mechanism to concatenate alignments. 

.. jupyter-execute::
    :raises:

    from cogent3 import get_app

    concat_alns_app = get_app("concat", moltype="dna")

Let's create sample alignments with matching sequence names to use in the below examples. 

.. jupyter-execute::
    :raises:

    from cogent3 import make_aligned_seqs

    aln1 = make_aligned_seqs({"s1": "AAA", "s2": "CAA", "s3": "AAA"}, moltype="dna")
    aln2 = make_aligned_seqs({"s1": "GCG", "s2": "GGG", "s3": "GGT"}, moltype="dna")
    aln1

.. jupyter-execute::
    :raises:

    aln2

How to concatenate alignments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, sequences without matching names in the corresponding alignment are omitted (``intersect=True``).

.. jupyter-execute::
    :raises:

    result = concat_alns_app([aln1, aln2])
    result

How to concatenate alignments with missing sequences
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By providing the argument ``intersect=False``, the ``concat`` app will include missing sequences across alignments. Missing sequences are replaced by a sequence of ``"?"``.

.. jupyter-execute::
    :raises:

    from cogent3 import make_aligned_seqs, get_app

    concat_missing = get_app("concat", moltype="dna", intersect=False)
    aln3 = make_aligned_seqs({"s4": "GCG", "s5": "GGG"}, moltype="dna")
    result = concat_missing([aln1, aln3])
    result

How to concatenated alignments with a delimiter ``"N"``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can insert an ``"N"`` character in between the concatenated sequences. 

.. jupyter-execute::
    :raises:
    
    from cogent3 import get_app

    concat_delim = get_app("concat", join_seq="N", moltype="dna")
    result = concat_delim([aln1, aln2])
    result