File: chapter_popgen.tex

package info (click to toggle)
python-biopython 1.68%2Bdfsg-3~bpo8%2B1
  • links: PTS, VCS
  • area: main
  • in suites: jessie-backports
  • size: 46,856 kB
  • sloc: python: 160,306; xml: 93,216; ansic: 9,118; sql: 1,208; makefile: 155; sh: 63
file content (113 lines) | stat: -rw-r--r-- 4,452 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
\chapter{Bio.PopGen: Population genetics}

Bio.PopGen is a Biopython module supporting population genetics,
available in Biopython 1.44 onwards.

The medium term objective for the module is to support widely used data
formats, applications and databases. This module is currently under intense
development and support for new features should appear at a rather fast pace.
Unfortunately this might also entail some instability on the API, especially
if you are using a development version. APIs that are made available on
our official public releases should be much more stable.

\section{GenePop}

GenePop (\url{http://genepop.curtin.edu.au/}) is a popular population
genetics software package supporting Hardy-Weinberg tests, linkage
desiquilibrium, population diferentiation, basic statistics, $F_{st}$ and
migration estimates, among others. GenePop does not supply sequence
based statistics as it doesn't handle sequence data.
The GenePop file format is supported by a wide range of other population
genetic software applications, thus making it a relevant format in the
population genetics field.

Bio.PopGen provides a parser and generator of GenePop file format.
Utilities to manipulate the content of a record are also provided.
Here is an example on how to read a GenePop file (you can find
example GenePop data files in the Test/PopGen directory of Biopython):

\begin{verbatim}
from Bio.PopGen import GenePop

handle = open("example.gen")
rec = GenePop.read(handle)
handle.close()
\end{verbatim}

This will read a file called example.gen and parse it. If you
do print rec, the record will be output again, in GenePop format.

The most important information in rec will be the loci names and
population information (but there is more -- use help(GenePop.Record)
to check the API documentation). Loci names can be found on rec.loci\_list.
Population information can be found on rec.populations.
Populations is a list with one element per population. Each element is itself
a list of individuals, each individual is a pair composed by individual
name and a list of alleles (2 per marker), here is an example for
rec.populations:

\begin{verbatim}
[
    [
        ('Ind1', [(1, 2),    (3, 3), (200, 201)],
        ('Ind2', [(2, None), (3, 3), (None, None)],
    ],
    [
        ('Other1', [(1, 1),  (4, 3), (200, 200)],
    ]
]
\end{verbatim}

So we have two populations, the first with two individuals, the
second with only one. The first individual of the first
population is called Ind1, allelic information for each of
the 3 loci follows. Please note that for any locus, information
might be missing (see as an example, Ind2 above).

A few utility functions to manipulate GenePop records are made
available, here is an example:

\begin{verbatim}
from Bio.PopGen import GenePop

#Imagine that you have loaded rec, as per the code snippet above...

rec.remove_population(pos)
#Removes a population from a record, pos is the population position in
#  rec.populations, remember that it starts on position 0.
#  rec is altered.

rec.remove_locus_by_position(pos)
#Removes a locus by its position, pos is the locus position in
#  rec.loci_list, remember that it starts on position 0.
#  rec is altered.

rec.remove_locus_by_name(name)
#Removes a locus by its name, name is the locus name as in
#  rec.loci_list. If the name doesn't exist the function fails
#  silently.
#  rec is altered.

rec_loci = rec.split_in_loci()
#Splits a record in loci, that is, for each loci, it creates a new
#  record, with a single loci and all populations.
#  The result is returned in a dictionary, being each key the locus name.
#  The value is the GenePop record.
#  rec is not altered.

rec_pops =  rec.split_in_pops(pop_names)
#Splits a record in populations, that is, for each population, it creates
#  a new record, with a single population and all loci.
#  The result is returned in a dictionary, being each key
#  the population name. As population names are not available in GenePop,
#  they are passed in array (pop_names).
#  The value of each dictionary entry is the GenePop record.
#  rec is not altered.
\end{verbatim}

GenePop does not support population names, a limitation which can be
cumbersome at times. Functionality to enable population names is currently
being planned for Biopython. These extensions won't break compatibility in
any way with the standard format.  In the medium term, we would also like to
support the GenePop web service.