File: score_conservation.1.pod

package info (click to toggle)
conservation-code 20110309.0-8
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, forky, sid, trixie
  • size: 608 kB
  • sloc: python: 512; sh: 24; makefile: 21
file content (152 lines) | stat: -rw-r--r-- 3,920 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
=pod

=head1 NAME

score_conservation - score protein sequence conservation

=head1 SYNOPSIS

score_conservation [options] ALIGNFILE

=head1 DESCRIPTION

Score protein sequence conservation in B<ALIGNFILE>.  B<ALIGNFILE> must be in FASTA, CLUSTAL or Stockholm format.

The following conservation scoring methods are implemented:
 * sum of pairs
 * weighted sum of pairs
 * Shannon entropy
 * Shannon entropy with property groupings (Mirny and Shakhnovich 1995,
   Valdar and Thornton 2001)
 * relative entropy with property groupings (Williamson 1995)
 * von Neumann entropy (Caffrey et al 2004)
 * relative entropy (Samudrala and Wang 2006)
 * Jensen-Shannon divergence (Capra and Singh 2007)

A window-based extension that incorporates the estimated conservation of
sequentially adjacent residues into the score for each column is also given.
This window approach can be applied to any of the conservation scoring
methods.

With default parameters score_conservation(1) computes the conservation scores for the alignment using the
Jensen-Shannon divergence and a window B<-w> of I<3>.

The sequence-specific output can be used as the conservation input for
concavity(1).

Conservation is highly predictive in identifying catalytic sites and
residues near bound ligands.

=head1 REFERENCES

=over

=item Capra JA and Singh M. Predicting functionally important residues from sequence conservation. Bioinformatics, 23(15):1875-82, 2007.

=back

=head1 OPTIONS

=over

=item -a [NAME]

Reference sequence. Print scores in reference to the named sequence (ignoring gaps). Default prints the entire column.

=item -b [0-1]

Lambda for window heuristic linear combination. Default=I<.5>.

Equation:

C<score = (1 - lambda) * average_score_over_window_around_middle + lambda * score_of_middle>

=item -d [FILE]

Background distribution file, e.g. F<distributions/swissprot.distribution>. Default=built-in BLOSUM62.

=item -g [0-1)]

Gap cutoff. Do not score columns that contain more than gap cutoff fraction gaps. Default=I<.3>.

=item -h

Print help.

=item -l [true|false]

Use sequence weighting. Default=I<true>.

=item -m [FILE]

Similarity matrix file, e.g. F<matrix/blosum62.bla> or .qij. Default=F<matrix/blosum62.bla>.

Some methods, e.g. I<js_divergence>, do not use this.

=item -n [true|false]

Normalize scores. Print the z-score (over the alignment) of each column raw score. Default=I<false>.

=item -o FILE

Output file. Default: standard output stream.

=item -p [true|false]

Use gap penalty. Lower the score of columns that contain gaps, proportionally to the sum weight of the gapped sequences. Default=I<true>.

=item -s [METHOD]

Conservation estimation method, one of I<shannon_entropy property_entropy property_relative_entropy vn_entropy relative_entropy js_divergence sum_of_pairs>. Default=I<js_divergence>.

=item -w [0-INT]

Window size. Number of residues on either side included in the window. Default=I<3>.

=back

=head1 EXAMPLES

Note: you may have to copy and uncompress the example data files before running the following examples.

=over

=item Compute conservation scores for the alignment using the Jensen-Shannon divergence with default settings and print out the scores:

 score_conservation __docdir__/examples/2plc__hssp-filtered.aln

=item Score an alignment using Jensen-Shannon divergence, a window of size 3 (on either side of the residue), and the swissprot background distribution:

 score_conservation -s js_divergence -w 3 -d \
  __pkgdatadir__/distributions/swissprot.distribution \
  __docdir__/examples/2plc__hssp-filtered.aln

=back

=head1 FILES

=over

=item Distributions

F<__pkgdatadir__/distributions>

=item Matrices

F<__pkgdatadir__/matrix>

=back

=head1 SEE ALSO

=over

=item Homepage L<http://compbio.cs.princeton.edu/conservation/>

=item Publication L<http://bioinformatics.oxfordjournals.org/cgi/content/full/23/15/1875>

=item concavity(1)

=back

=cut