
|
.. _dna-rna-seqs:
``Sequence``
============
The ``Sequence`` object contains classes that represent biological sequence data. These provide generic biological sequence manipulation functions, plus functions that are critical for the ``evolve`` module calculations.
.. warning:: Do not import sequence classes directly! It is expected that you will access them through ``MolType`` objects. The molecular types can be accessed via the ``cogent3.get_moltype()`` function. Sequence classes depend on information from the ``MolType`` that is **only** available after ``MolType`` has been imported. Sequences are intended to be immutable. This is not enforced by the code for performance reasons, but don't alter the ``MolType`` or the sequence data after creation.
DNA and RNA sequences
---------------------
.. authors, Gavin Huttley, Kristian Rother, Patrick Yannul, Tom Elliott, Tony Walters, Meg Pirrung
Creating a DNA sequence from a string
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
All sequence and alignment objects have a molecular type, or ``MolType`` which provides key properties for validating sequence characters. Here we use the ``DNA`` ``MolType`` to create a DNA sequence.
.. jupyter-execute::
from cogent3 import DNA
my_seq = DNA.make_seq("AGTACACTGGT")
my_seq
print(my_seq)
str(my_seq)
Creating a RNA sequence from a string
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import RNA
rnaseq = RNA.make_seq("ACGUACGUACGUACGU")
Converting to FASTA format
^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import DNA
my_seq = DNA.make_seq("AGTACACTGGT")
print(my_seq.to_fasta())
Convert a RNA sequence to FASTA format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import RNA
rnaseq = RNA.make_seq("ACGUACGUACGUACGU")
rnaseq.to_fasta()
Creating a named sequence
^^^^^^^^^^^^^^^^^^^^^^^^^
You can also use a convenience ``make_seq()`` function, providing the moltype as a string.
.. jupyter-execute::
from cogent3 import make_seq
my_seq = make_seq("AGTACACTGGT", "my_gene", moltype="dna")
my_seq
type(my_seq)
Setting or changing the name of a sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import make_seq
my_seq = make_seq("AGTACACTGGT", moltype="dna")
my_seq.name = "my_gene"
print(my_seq.to_fasta())
Complementing a DNA sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import DNA
my_seq = DNA.make_seq("AGTACACTGGT")
print(my_seq.complement())
Reverse complementing a DNA sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
print(my_seq.rc())
The ``rc`` method name is easier to type
.. jupyter-execute::
print(my_seq.rc())
.. _translation:
Translate a ``DnaSequence`` to protein
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import DNA
my_seq = DNA.make_seq("GCTTGGGAAAGTCAAATGGAA", "protein-X")
pep = my_seq.get_translation()
type(pep)
print(pep.to_fasta())
Converting a DNA sequence to RNA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import DNA
my_seq = DNA.make_seq("ACGTACGTACGTACGT")
print(my_seq.to_rna())
Convert an RNA sequence to DNA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import RNA
rnaseq = RNA.make_seq("ACGUACGUACGUACGU")
print(rnaseq.to_dna())
Testing complementarity
^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import DNA
a = DNA.make_seq("AGTACACTGGT")
a.can_pair(a.complement())
a.can_pair(a.rc())
Joining two DNA sequences
^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import DNA
my_seq = DNA.make_seq("AGTACACTGGT")
extra_seq = DNA.make_seq("CTGAC")
long_seq = my_seq + extra_seq
long_seq
str(long_seq)
Slicing DNA sequences
^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
my_seq[1:6]
Getting 3rd positions from codons
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The easiest approach is to work off the ``cogent3`` ``ArrayAlignment`` object.
We'll do this by specifying the position indices of interest, creating a sequence ``Feature`` and using that to extract the positions.
.. jupyter-execute::
from cogent3 import DNA
seq = DNA.make_array_seq("ATGATGATGATG")
pos3 = seq[2::3]
assert str(pos3) == "GGGG"
Getting 1st and 2nd positions from codons
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In this instance we can use the annotatable sequence classes.
.. jupyter-execute::
from cogent3 import DNA
seq = DNA.make_seq("ATGATGATGATG")
indices = [(i, i + 2) for i in range(len(seq))[::3]]
pos12 = seq.add_feature("pos12", "pos12", indices)
pos12 = pos12.get_slice()
assert str(pos12) == "ATATATAT"
Return a randomized version of the sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
::
print rnaseq.shuffle()
ACAACUGGCUCUGAUG
Remove gaps from a sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. jupyter-execute::
from cogent3 import RNA
s = RNA.make_seq("--AUUAUGCUAU-UAu--")
print(s.degap())
|