File: FreqTable.py

package info (click to toggle)
python-biopython 1.78%2Bdfsg-4
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 65,756 kB
  • sloc: python: 221,141; xml: 178,777; ansic: 13,369; sql: 1,208; makefile: 131; sh: 70
file content (107 lines) | stat: -rw-r--r-- 2,821 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# Copyright 2000 by Iddo Friedberg idoerg@cc.huji.ac.il
#
# This file is part of the Biopython distribution and governed by your
# choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
# Please see the LICENSE file that should have been included as part of this
# package.

r"""A class to handle frequency tables or letter count files.

Example files for a DNA alphabet:

A count file (whitespace separated)::

 A  50
 C  37
 G  23
 T  58

The same info as a frequency file::

 A 0.2976
 C 0.2202
 G 0.1369
 T 0.3452

Functions:
  :read_count(f): read a count file from stream f. Then convert to
                  frequencies.
  :read_freq(f): read a frequency data file from stream f. Of course, we then
                 don't have the counts, but it is usually the letter frequencies
                 which are interesting.

Methods:
  (all internal)

Attributes:
  :alphabet: The letters you are using as indices into the table.
  :data: Frequency dictionary.
  :count: Count dictionary. Empty if no counts are provided.

Example of use:
    >>> import io
    >>> from Bio.SubsMat import FreqTable
    >>> f_count = io.StringIO(u"A  50\nC  37\nG  23\nT  58")
    >>> ftab = FreqTable.read_count(f_count)
    >>> for nb in sorted(ftab):
    ...     print("%s %0.4f" %(nb, ftab[nb]))
    ...
    A 0.2976
    C 0.2202
    G 0.1369
    T 0.3452

"""


COUNT = 1
FREQ = 2


class FreqTable(dict):
    """Define class to handle frequency tables or letter count files."""

    def _freq_from_count(self):
        """Calculate frequency from count values (PRIVATE)."""
        total = float(sum(self.count.values()))
        for i, v in self.count.items():
            self[i] = v / total

    def _alphabet_from_input(self):
        """Order the alphabet (PRIVATE)."""
        s = ""
        for i in sorted(self):
            s += i
        return s

    def __init__(self, in_dict, dict_type, alphabet=None):
        """Initialize the class."""
        self.alphabet = alphabet
        if dict_type == COUNT:
            self.count = in_dict
            self._freq_from_count()
        elif dict_type == FREQ:
            self.count = {}
            self.update(in_dict)
        else:
            raise ValueError("bad dict_type")
        if not alphabet:
            self.alphabet = self._alphabet_from_input()


def read_count(f):
    """Read a count file f and load values to the Frequency Table."""
    count = {}
    for line in f:
        key, value = line.strip().split()
        count[key] = int(value)
    return FreqTable(count, COUNT)


def read_freq(f):
    """Read a frequency data file f and load values to the Frequency Table."""
    freq_dict = {}
    for line in f:
        key, value = line.strip().split()
        freq_dict[key] = float(value)
    return FreqTable(freq_dict, FREQ)