File: pbsScoreMatrix.1

package info (click to toggle)
phast 1.5%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 13,008 kB
  • sloc: ansic: 54,195; makefile: 358; sh: 337; perl: 321
file content (106 lines) | stat: -rw-r--r-- 3,816 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
.TH PBSSCOREMATRIX "1" "May 2016" "pbsScoreMatrix 1.4" "User Commands"
.SH NAME
pbsScoreMatrix \- Generate log-odds score matrices for use in alignment of
.SH DESCRIPTION
Generate log\-odds score matrices for use in alignment of
probabilistic biological sequences (PBSs).  By default, generates
a matrix for every branch of the tree (as defined in tree.mod),
but can also generate a matrix for a given branch length (see
\fB\-\-branch\-length\fR).  For a code size of N, an N x N matrix is
generated by default; \fB\-\-half\-pbs\fR will produce an N x 4 matrix, and
\fB\-\-no\-pbs\fR will produce a 4 x 4 matrix (assuming a four\-character
nucleotide alphabet).
.PP
Two sequences are assumed to have evolved from a common ancestor
by a reversible continuous\-time Markov substitution process, and
to be separated by a branch of length t.  The conditional
probability of a base j in one sequence given a base i in the
other, P(j | i, t) is given by element (i, j) of the matrix
.IP
P(t) = exp(Qt)
.PP
where Q is the rate matrix defining the substitution process, and
element (i, j) of Q is the instantaneous rate at which base i
changes to base j.
.PP
Let S_t(i, j) be a log odds score for the alignment of two bases, i
and j, based on P(t):
.IP
S_t(i, j) = log P(i, j | t) / (pi(i) * pi(j))
.IP
= log P(j | i, t) pi(i) / (pi(i) * pi(j))
.IP
= log P(j | i, t) / pi(j)                         (1)
.PP
where pi(x) is the "equilibrium" or "background" probability of
base x.  Because of reversibility, S(i, j) = S(j, i), and the S(i,
j) form a symmetric 4 x 4 matrix.  This is the matrix that is
generated by pbsScoreMatrix with the \fB\-\-no\-pbs\fR option.
If each "letter" in each sequence represents a probability
distribution over bases, as in a PBS, then the score for two
letters k and l can be shown to be
.IP
S'_t(k, l) = log sum_i sum_j p_k(i) p_l(j) exp S_t(i, j)
(2)
.PP
where the two sums are over the four bases, p_k(i) is the probability
of base i under the distribution for k, and p_l(j) is the
probability of base j under the distribution for l.
.PP
Notice that (2) reduces to (1) when p_k(i) = p_l(j) = 1 for some i
and j and for all other i' and j' p_k(i') = p_l(j') = 0 (i.e.,
when all of the probability mass is on a single base in both
distributions and the PBS reduces to an ordinary nucleotide
sequence).  The special case of p_l(j) = 1 only is also of
interest when aligning a PBS and a nucleotide sequence:
.IP
S''_t(k, j) = log sum_i p_k(i) exp S_t(i, j)
(3)
.PP
This is the matrix generated by pbsScoreMatrix with the
\fB\-\-half\-pbs\fR option.
Note: all logs are base 2.
.SH EXAMPLE
Generate an N x N matrix for every branch of the tree, using a
code file "code" (generated by pbsTrain) and a tree model file
"mytree.mod" (generated by phyloFit):
.IP
pbsScoreMatrix mytree.mod code > matrices.dat
.PP
Generate an N x N matrix for a branch length of 0.2 expected
substitutions per site.
.IP
pbsScoreMatrix \fB\-\-branch\-length\fR 0.2 mytree.mod code > matrix.dat
.PP
Generate an N x 4 matrix:
.IP
pbsScoreMatrix \fB\-\-branch\-length\fR 0.2 \fB\-\-half\-pbs\fR mytree.mod
code > matrix.dat
.PP
Generate a 4 x 4 matrix:
.IP
pbsScoreMatrix \fB\-\-branch\-length\fR 0.2 \fB\-\-no\-pbs\fR code mytree.mod
\f(CW> matrix.dat\fR
.PP
(In this case, a code file is not needed.)
.SH OPTIONS
.HP
\fB\-\-branch\-length\fR, \fB\-t\fR <length>
.IP
Output a matrix for a branch of the specified length, rather
than a matrix for every branch of the tree.  The given length
must be non\-negative and in units of expected substitutions
per site.
.HP
\fB\-\-half\-pbs\fR, \fB\-H\fR
.IP
Output an N x 4 matrix, as described above.
.HP
\fB\-\-no\-pbs\fR, \fB\-N\fR
Output a 4 x 4 matrix, as described above.
With this option,
a code file is not needed.
.HP
\fB\-\-help\fR, \fB\-h\fR
.IP
Show this help message.