File: sample-take-pos.rst

package info (click to toggle)
python-cogent 2024.5.7a1%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 74,600 kB
  • sloc: python: 92,479; makefile: 117; sh: 16
file content (109 lines) | stat: -rw-r--r-- 3,703 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
.. jupyter-execute::
    :hide-code:

    import set_working_directory

Sample nucleotides from a given codon position
----------------------------------------------

The ``take_codon_positions`` app allows you to extract all nucleotides at a given codon position from an alignment. 

Let's create a sample alignment for our example. 

.. jupyter-execute::
    :raises:

    from cogent3 import make_aligned_seqs

    aln = make_aligned_seqs({"s1": "ACGACGACG", "s2": "GATGATGAT"}, moltype="dna")
    aln

Extract the third codon position from an alignment 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We can achieve this by creating the ``take_codon_positions`` app with ``3`` as a positional argument.

.. jupyter-execute::
    :raises:

    from cogent3 import get_app

    take_pos3 = get_app("take_codon_positions", 3, moltype="dna")
    result = take_pos3(aln)
    result

Extract the first and second codon positions from an alignment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We can achieve this by creating the ``take_codon_positions`` app with ``1`` and ``2`` as a positional argument. 

.. jupyter-execute::
    :raises:

    from cogent3 import get_app

    take_pos12 = get_app("take_codon_positions", 1, 2, moltype="dna")
    result = take_pos12(aln)
    result

Extract only the third codon positions from four-fold degenerate codons
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We can achieve this by creating the ``take_codon_positions`` app with the argument ``fourfold_degenerate=True``. 

.. jupyter-execute::
    :raises:

    from cogent3 import get_app, make_aligned_seqs

    aln_ff = make_aligned_seqs({"s1": "GCAAGCGTTTAT", "s2": "GCTTTTGTCAAT"})
    take_fourfold = get_app("take_codon_positions", fourfold_degenerate=True, moltype="dna")
    result = take_fourfold(aln_ff)
    result

Create a composed process which samples only the third codon position
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::
    :hide-code:

    from tempfile import TemporaryDirectory

    tmpdir = TemporaryDirectory(dir=".")
    path_to_dir = tmpdir.name

Let's set up a data store containing all the files with the ".fasta" suffix in the data directory, limiting the data store to two members as a minimum example.

.. jupyter-execute::
    :raises:

    from cogent3 import open_data_store

    fasta_seq_dstore = open_data_store("data", suffix="fasta", mode="r", limit=2)

Now let's set up a process composing the following apps: ``load_aligned`` (loads the sequences ), ``take_codon_positions`` (extracts the third codon position), and ``write_seqs`` (writes the filtered sequences to a data store). 

.. note:: Learn the basics of turning apps into composed processes :ref:`here! <apps>` 

.. jupyter-execute::
    :raises:
    
    from cogent3 import get_app, open_data_store

    out_dstore = open_data_store(path_to_dir, suffix="fa", mode="w")

    loader = get_app("load_aligned", format="fasta", moltype="dna")
    cpos3 = get_app("take_codon_positions", 3)
    writer = get_app("write_seqs", out_dstore, format="fasta")

    process = loader + cpos3 + writer

.. tip:: When running this code on your machine, remember to replace ``path_to_dir`` with an actual directory path.

Now let's apply ``process`` to our data store! This populates ``out_dstore`` (which is returned by the ``.apply_to()`` call) with the filtered alignments. We can index ``out_dstore`` to see individual data members. We could take a closer look using the ``.read()`` method on data members. 

.. jupyter-execute::
    :raises:

    out_dstore = process.apply_to(fasta_seq_dstore)
    out_dstore.describe