1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216
|
.. _dna-rna-seqs:
``Sequence``
============
The ``Sequence`` object contains classes that represent biological sequence data. These provide generic biological sequence manipulation functions, plus functions that are critical for the ``evolve`` module calculations.
.. warning:: Do not import sequence classes directly! It is expected that you will access them through ``MolType`` objects. The molecular types can be accessed via the ``cogent3.get_moltype()`` function. Sequence classes depend on information from the ``MolType`` that is **only** available after ``MolType`` has been imported. Sequences are intended to be immutable. This is not enforced by the code for performance reasons, but don't alter the ``MolType`` or the sequence data after creation.
DNA and RNA sequences
---------------------
.. authors, Gavin Huttley, Kristian Rother, Patrick Yannul, Tom Elliott, Tony Walters, Meg Pirrung
Creating a DNA sequence from a string
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
All sequence and alignment objects have a molecular type, or ``MolType`` which provides key properties for validating sequence characters. Here we use the ``DNA`` ``MolType`` to create a DNA sequence.
.. jupyter-execute::
from cogent3 import DNA
my_seq = DNA.make_seq("AGTACACTGGT")
my_seq
print(my_seq)
str(my_seq)
Creating a RNA sequence from a string
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import RNA
rnaseq = RNA.make_seq("ACGUACGUACGUACGU")
Converting to FASTA format
^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import DNA
my_seq = DNA.make_seq("AGTACACTGGT")
print(my_seq.to_fasta())
Convert a RNA sequence to FASTA format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import RNA
rnaseq = RNA.make_seq("ACGUACGUACGUACGU")
rnaseq.to_fasta()
Creating a named sequence
^^^^^^^^^^^^^^^^^^^^^^^^^
You can also use a convenience ``make_seq()`` function, providing the moltype as a string.
.. jupyter-execute::
from cogent3 import make_seq
my_seq = make_seq("AGTACACTGGT", "my_gene", moltype="dna")
my_seq
type(my_seq)
Setting or changing the name of a sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import make_seq
my_seq = make_seq("AGTACACTGGT", moltype="dna")
my_seq.name = "my_gene"
print(my_seq.to_fasta())
Complementing a DNA sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import DNA
my_seq = DNA.make_seq("AGTACACTGGT")
print(my_seq.complement())
Reverse complementing a DNA sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
print(my_seq.rc())
The ``rc`` method name is easier to type
.. jupyter-execute::
print(my_seq.rc())
.. _translation:
Translate a ``DnaSequence`` to protein
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import DNA
my_seq = DNA.make_seq("GCTTGGGAAAGTCAAATGGAA", "protein-X")
pep = my_seq.get_translation()
type(pep)
print(pep.to_fasta())
Converting a DNA sequence to RNA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import DNA
my_seq = DNA.make_seq("ACGTACGTACGTACGT")
print(my_seq.to_rna())
Convert an RNA sequence to DNA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import RNA
rnaseq = RNA.make_seq("ACGUACGUACGUACGU")
print(rnaseq.to_dna())
Testing complementarity
^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import DNA
a = DNA.make_seq("AGTACACTGGT")
a.can_pair(a.complement())
a.can_pair(a.rc())
Joining two DNA sequences
^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import DNA
my_seq = DNA.make_seq("AGTACACTGGT")
extra_seq = DNA.make_seq("CTGAC")
long_seq = my_seq + extra_seq
long_seq
str(long_seq)
Slicing DNA sequences
^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
my_seq[1:6]
Getting 3rd positions from codons
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The easiest approach is to work off the ``cogent3`` ``ArrayAlignment`` object.
We'll do this by specifying the position indices of interest, creating a sequence ``Feature`` and using that to extract the positions.
.. jupyter-execute::
from cogent3 import DNA
seq = DNA.make_array_seq("ATGATGATGATG")
pos3 = seq[2::3]
assert str(pos3) == "GGGG"
Getting 1st and 2nd positions from codons
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In this instance we can use the annotatable sequence classes.
.. jupyter-execute::
from cogent3 import DNA
seq = DNA.make_seq("ATGATGATGATG")
indices = [(i, i + 2) for i in range(len(seq))[::3]]
pos12 = seq.add_feature("pos12", "pos12", indices)
pos12 = pos12.get_slice()
assert str(pos12) == "ATATATAT"
Return a randomized version of the sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
::
print rnaseq.shuffle()
ACAACUGGCUCUGAUG
Remove gaps from a sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import RNA
s = RNA.make_seq("--AUUAUGCUAU-UAu--")
print(s.degap())
|