File: FreqTable.py

package info (click to toggle)
python-biopython 1.68%2Bdfsg-3~bpo8%2B1
  • links: PTS, VCS
  • area: main
  • in suites: jessie-backports
  • size: 46,856 kB
  • sloc: python: 160,306; xml: 93,216; ansic: 9,118; sql: 1,208; makefile: 155; sh: 63
file content (95 lines) | stat: -rw-r--r-- 2,524 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# This code is part of the Biopython distribution and governed by its
# license.  Please see the LICENSE file that should have been included
# as part of this package.
# Copyright Iddo Friedberg idoerg@cc.huji.ac.il
"""A class to handle frequency tables

Methods to read a letter frequency or a letter count file:
Example files for a DNA alphabet:

A count file (whitespace separated):

A  50
C  37
G  23
T  58

The same info as a frequency file:

A 0.2976
C 0.2202
G 0.1369
T 0.3452

Functions:
  read_count(f): read a count file from stream f. Then convert to
  frequencies
  read_freq(f): read a frequency data file from stream f. Of course, we then
  don't have the counts, but it is usually the letter frquencies which are
  interesting.

Methods:
  (all internal)
Attributes:
  alphabet: The IUPAC alphabet set (or any other) whose letters you are
  using. Common sets are: IUPAC.protein (20-letter protein),
  IUPAC.unambiguous_dna (4-letter DNA). See Bio/alphabet for more.
  data: frequency dictionary.
  count: count dictionary. Empty if no counts are provided.

Example of use:
  >>> from SubsMat import FreqTable
  >>> ftab = FreqTable.FreqTable(my_frequency_dictionary,FreqTable.FREQ)
  >>> ftab = FreqTable.FreqTable(my_count_dictionary,FreqTable.COUNT)
  >>> ftab = FreqTable.read_count(open('myDNACountFile'))

"""

from Bio import Alphabet
COUNT = 1
FREQ = 2


class FreqTable(dict):

    def _freq_from_count(self):
        total = float(sum(self.count.values()))
        for i, v in self.count.items():
            self[i] = v / total

    def _alphabet_from_input(self):
        s = ''
        for i in sorted(self):
            s += i
        return s

    def __init__(self, in_dict, dict_type, alphabet=None):
        self.alphabet = alphabet
        if dict_type == COUNT:
            self.count = in_dict
            self._freq_from_count()
        elif dict_type == FREQ:
            self.count = {}
            self.update(in_dict)
        else:
            raise ValueError("bad dict_type")
        if not alphabet:
            self.alphabet = Alphabet.Alphabet()
            self.alphabet.letters = self._alphabet_from_input()


def read_count(f):
    count = {}
    for line in f:
        key, value = line.strip().split()
        count[key] = int(value)
    freq_table = FreqTable(count, COUNT)
    return freq_table


def read_freq(f):
    freq_dict = {}
    for line in f:
        key, value = line.strip().split()
        freq_dict[key] = float(value)
    return FreqTable(freq_dict, FREQ)