File: moltypesequence.rst

package info (click to toggle)
python-cogent 1.5.3-2
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 16,424 kB
  • ctags: 24,343
  • sloc: python: 134,200; makefile: 100; ansic: 17; sh: 10
file content (100 lines) | stat: -rw-r--r-- 4,200 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
**********************************************
Using the ``MolType`` and ``Sequence`` objects
**********************************************

.. authors Meg Pirrung

MolType
=======

``MolType`` provides services for resolving ambiguities, or providing the correct ambiguity for recoding. It also maintains the mappings between different kinds of alphabets, sequences and alignments.

One issue with ``MolType``'s is that they need to know about ``Sequence``, ``Alphabet``, and other objects, but, at the same time, those objects need to know about the ``MolType``. It is thus essential that the connection between these other types and the ``MolType`` can be made after the objects are created.

Setting up a ``MolType`` object with an RNA sequence
----------------------------------------------------

.. doctest::

   >>> from cogent.core.moltype import MolType, IUPAC_RNA_chars,\
   ...   IUPAC_RNA_ambiguities, RnaStandardPairs, RnaMW,\
   ...   IUPAC_RNA_ambiguities_complements
   >>> from cogent.core.sequence import NucleicAcidSequence
   >>> testrnaseq = 'ACGUACGUACGUACGU'
   >>> RnaMolType = MolType(
   ...     Sequence = NucleicAcidSequence(testrnaseq),
   ...     motifset = IUPAC_RNA_chars,
   ...     Ambiguities = IUPAC_RNA_ambiguities,
   ...     label = "rna_with_lowercase",
   ...     MWCalculator = RnaMW,
   ...     Complements = IUPAC_RNA_ambiguities_complements,
   ...     Pairs = RnaStandardPairs,
   ...     add_lower=True,
   ...     preserve_existing_moltypes=True,
   ...     make_alphabet_group=True,
   ...     )

Setting up a ``MolType`` object with a DNA sequence
---------------------------------------------------

.. doctest::

    >>> from cogent.core.moltype import MolType, IUPAC_DNA_chars,\
    ...   IUPAC_DNA_ambiguities, DnaMW, IUPAC_DNA_ambiguities_complements,\
    ...   DnaStandardPairs
   >>> testdnaseq = 'ACGTACGTACGUACGT'
   >>> DnaMolType = MolType(
   ...     Sequence = NucleicAcidSequence(testdnaseq),
   ...     motifset = IUPAC_DNA_chars,
   ...     Ambiguities = IUPAC_DNA_ambiguities,
   ...     label = "dna_with_lowercase",
   ...     MWCalculator = DnaMW,
   ...     Complements = IUPAC_DNA_ambiguities_complements,
   ...     Pairs = DnaStandardPairs,
   ...     add_lower=True,
   ...     preserve_existing_moltypes=True,
   ...     make_alphabet_group=True,
   ...     )


Setting up a ``MolType`` object with a protein sequence
-------------------------------------------------------

.. doctest::

    >>> from cogent.core.moltype import MolType, IUPAC_PROTEIN_chars,\
    ...   IUPAC_PROTEIN_ambiguities, ProteinMW
   >>> from cogent.core.sequence import ProteinSequence, ModelProteinSequence
   >>> protstr = 'TEST'
   >>> ProteinMolType = MolType(
   ...     Sequence = ProteinSequence(protstr),
   ...     motifset = IUPAC_PROTEIN_chars,
   ...     Ambiguities = IUPAC_PROTEIN_ambiguities,
   ...     MWCalculator = ProteinMW,
   ...     make_alphabet_group=True,
   ...     ModelSeq = ModelProteinSequence,
   ...     label = "protein")
   >>> protseq = ProteinMolType.Sequence

Verify sequences
----------------

.. doctest::

   >>> rnastr = 'ACGUACGUACGUACGU'
   >>> dnastr = 'ACGTACGTACGTACGT'
   >>> RnaMolType.isValid(rnastr)
   True
   >>> RnaMolType.isValid(dnastr)
   False
   >>> RnaMolType.isValid(NucleicAcidSequence(dnastr).toRna())
   True

``Sequence``
============

The ``Sequence`` object contains classes that represent biological sequence data. These provide generic biological sequence manipulation functions, plus functions that are critical for the ``evolve`` module calculations.

.. warning:: Do not import sequence classes directly! It is expected that you will access them through ``MolType`` objects. The most common molecular types ``DNA``, ``RNA``, ``PROTEIN`` are provided as top level imports in cogent (e.g. ``cogent.DNA``). Sequence classes depend on information from the ``MolType`` that is **only** available after ``MolType`` has been imported. Sequences are intended to be immutable. This is not enforced by the code for performance reasons, but don't alter the ``MolType`` or the sequence data after creation.

More detailed usage of sequence objects can be found in :ref:`dna-rna-seqs`.