File: pfscale.1

package info (click to toggle)
pftools 3.2.12-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 92,208 kB
  • sloc: ansic: 17,779; fortran: 12,000; perl: 2,956; sh: 232; makefile: 26; f90: 3
file content (324 lines) | stat: -rw-r--r-- 8,286 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
.\"
.\" $Id: pfscale.1,v 1.2 2003/08/11 12:09:14 vflegel Exp $
.\" Copyright (c) 2003 SIB Swiss Institute of Bioinformatics <pftools@sib.swiss>
.\" Process this file with
.\" groff -man -Tascii <name>
.\" for ascii output or
.\" groff -man -Tps <name>
.\" for postscript output
.\"
.TH PFSCALE 1 "August 2003" "pftools 2.3" "pftools"
.\" ------------------------------------------------
.\" Name section
.\" ------------------------------------------------
.SH NAME
pfscale \- fit parameters of an extreme-value distribution to a profile score list 
.\" ------------------------------------------------
.\" Synopsis section
.\" ------------------------------------------------
.SH SYNOPSIS
.TP 10
.B pfscale
[
.B \-hl
] [
.B \-L
.I log_base
] [
.B \-M
.I mode_nb
] [
.B \-N
.I db_size
] [
.B \-P
.I upper_limit
] [
.B \-Q
.I lower_limit
] [
.I score_list
|
.B \-
] [
.I profile
] [
.I parameters
]
.\" ------------------------------------------------
.\" Description section
.\" ------------------------------------------------
.SH DESCRIPTION
.B pfscale 
fits the two parameters of an extreme-value distribution to a
sorted score distribution obtained
by searching a sequence database with a profile. 
The file
.RI ' score_list '
is a sorted list of profile match scores generated by
.BR pfsearch .
If
.RB ' \- '
is specified instead of a filename, the score list is read from the
standard input. The result is written to the standard output.
.PP
If the original profile is given as the second argument, 
the normalization function with the lowest mode number or the lowest priority number
specified within the profile will be 
updated such as to produce -Log10 per-residue E-values. 
If the second argument is omitted, the output 
consists of a header line containing the normalization parameters
followed by a modified score list, 
showing
.IR "score rank" ,
.IR "original raw scores" ,
.I log-cumulative frequencies
and
corresponding
.I normalized scores
next to each other.
.PP
Note that this program implements the significance estimation procedure for profile
match scores described in Hofmann & Bucher (1995). 
It has been used for the calculation of the normalization parameters of 
all profiles in the
.SM PROSITE
database. 
.\" ------------------------------------------------
.\" Options section
.\" ------------------------------------------------
.SH OPTIONS 
.\" --- ms_file ---
.TP
.I score_list
Input score list.
.br
The file must contain a sorted list of scores. The first field
of each line is considered as being a score, all other fields on the same line are ignored.
The different fields of each line should be delimited by whitespaces.
If the filename is replaced by a
.RB ' \- ',
.B pfscale
will read the score list from
.BR stdin .
.\" --- profile ---
.TP
.I profile
Optional profile file.
.br
If a filename is specified, the profile will be parsed and
either the lowest priority mode or the mode number specified with option
.B \-M
will be scaled. All cut-off levels which use the specified mode number will also
be updated.
.\" --- h ---
.TP
.B \-h
Display usage help text.
.\" --- l ---
.TP
.B \-l
Remove output line length limit. Individual lines of the output profile
can exceed a length of 132 characters, removing the need to wrap them over several lines. 
.\" --- L ---
.TP
.BI \-L\  log_base
Logarithmic base of the parameters of the estimated extreme-value 
distribution. 
The parameters reported by 
.B pfscale
are expressed as logarithms
and thus can be inserted directly into a linear normalization function
defined in a generalized profile.
.br
Default: 10
.\" --- M ---
.TP
.BI \-M\  mode_nb
Mode number to scale.
.br
Defines which mode number (and implicitly which cut-off level) of the
input
.SM PROSITE
profile should be scaled. This overrides the default behaviour of scaling
only the normalization mode with the lowest priority (or lowest mode number).
All cut-off levels defined in the profile as using this mode number (via the
.I MODE
keyword) will be updated as well.
.\" --- N ---
.TP
.BI \-N\  db_size
Size of the database from which the input score list was derived.
The searched database is typically a shuffled version
of a real protein or nucleotide sequence database.
.br
Default: 14147368 (size of
.SM SWISS-PROT
release 30 and shuffled derivatives of it).
.\" --- P ---
.TP
.BI \-P\  upper_limit
Upper threshold of the probability range to which the extreme-value
distribution will be fitted. 
For instance: if
.IR N =10'000'000
and 
.IR P =0.0001
then profile match scores below rank 1000
in the sorted input list
(corresponding to occurrence probabilities > 0.0001)
will be ignored.
.br
Default: 0.0001
.\" --- Q ---
.TP
.BI \-Q\  lower_limit
Lower threshold of the probability range to which the extreme-value
distribution will be fitted. 
For instance: if
.IR N =10'000'000
and
.IR Q =0.000001
then profile match scores above rank 10 in the sorted input list
(corresponding to occurrence probabilities < 0.000001)
will be ignored.
.br
Default: 0.000001
.\" ------------------------------------------------
.\" Parameters section
.\" ------------------------------------------------
.SH PARAMETERS
.TP
Note:
for backwards compatibility, release 2.3 of the
.B pftools
package will parse the version 2.2 style parameters, but these are
.I deprecated
and the corresponding option (refer to the
.I options
section) should be used instead.
.TP
L=#
Logarithmic base.
.br
Use option
.B \-L
instead.
.TP
M=#
Mode number.
.br
Use option
.B \-M
instead.
.TP
N=#
Database size.
.br
Use option
.B \-N
instead.
.TP
P=#
Upper probability threshold.
.br
Use option
.B \-P
instead.
.TP
Q=#
Lower probability threshold.
.br
Use option
.B \-Q
instead.
.\" ------------------------------------------------
.\" Examples section
.\" ------------------------------------------------
.SH EXAMPLES
.TP
(1)
.B pfsearch
\-fr \-C 200 sh3.prf shuffle20.seq |
.B sort
\-nr | 
.B pfscale
\-P 0.0001 \-Q 0.000001 \-
.IP
derives score-normalization parameters for the SH3 domain profile 
in file
.RB ' sh3.prf '. 
The file
.RB ' shuffle20.seq '
contains a window-shuffled derivative of 
.SM SWISS-PROT
release 30 in Pearson/Fasta format (window-size 20). 
Note that the implicit default of 
.I N
corresponds to the size of this database and thus 
needs not to be specified on the command line.
The cut-off value 200 for the
.BR pfsearch (1)
option
.B \-C
will produce about 2000 matches completely covering the range defined by
the command line parameters
.B \-P
and 
.B \-Q
of
.BR pfscale .
A suitable cut-off value has to be guessed in advance 
by computing a few optimal alignment scores for random sequences. 
.\" ------------------------------------------------
.\" Exit code section
.\" ------------------------------------------------
.SH EXIT CODE
.LP
On successful completion of its task,
.B pfscale
will return an exit code of 0. If an error occurs, a diagnostic message will be
output on standard error and the exit code will be different from 0. When conflicting
options where passed to the program but the task could nevertheless be completed, warnings
will be issued on standard error.
.\" ------------------------------------------------
.\" Notes section
.\" ------------------------------------------------
.SH NOTES
.TP
(1)
The current version of
.B pfscale
does not yet support the
.BR xpsa (5)
output format produced by
.BR pfscan "(1) or " pfsearch (1).
The score list should therefore be generated without the
.BR pfscan "(1) and " pfsearch (1)
option
.BR \-k .
.\" ------------------------------------------------
.\" References section
.\" ------------------------------------------------
.SH REFERENCES
.LP
Hofmann K & Bucher P. (1995).
.I The FHA-domain: a nuclear signalling domain found in protein kinases and transcription factors. 
Trends Biochem. Sci.
.BR 20 :47-349. 
.\" ------------------------------------------------
.\" See also section
.\" ------------------------------------------------
.SH "SEE ALSO"
.BR pfsearch (1),
.BR pfscan (1),
.BR xpsa (5)
.\" ------------------------------------------------
.\" Author section
.\" ------------------------------------------------
.SH AUTHOR
The
.B pftools
package was developed by Philipp Bucher.
.br
Any comments or suggestions should be addressed to <pftools@sib.swiss>.