1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324
|
.\"
.\" $Id: pfscale.1,v 1.2 2003/08/11 12:09:14 vflegel Exp $
.\" Copyright (c) 2003 SIB Swiss Institute of Bioinformatics <pftools@sib.swiss>
.\" Process this file with
.\" groff -man -Tascii <name>
.\" for ascii output or
.\" groff -man -Tps <name>
.\" for postscript output
.\"
.TH PFSCALE 1 "August 2003" "pftools 2.3" "pftools"
.\" ------------------------------------------------
.\" Name section
.\" ------------------------------------------------
.SH NAME
pfscale \- fit parameters of an extreme-value distribution to a profile score list
.\" ------------------------------------------------
.\" Synopsis section
.\" ------------------------------------------------
.SH SYNOPSIS
.TP 10
.B pfscale
[
.B \-hl
] [
.B \-L
.I log_base
] [
.B \-M
.I mode_nb
] [
.B \-N
.I db_size
] [
.B \-P
.I upper_limit
] [
.B \-Q
.I lower_limit
] [
.I score_list
|
.B \-
] [
.I profile
] [
.I parameters
]
.\" ------------------------------------------------
.\" Description section
.\" ------------------------------------------------
.SH DESCRIPTION
.B pfscale
fits the two parameters of an extreme-value distribution to a
sorted score distribution obtained
by searching a sequence database with a profile.
The file
.RI ' score_list '
is a sorted list of profile match scores generated by
.BR pfsearch .
If
.RB ' \- '
is specified instead of a filename, the score list is read from the
standard input. The result is written to the standard output.
.PP
If the original profile is given as the second argument,
the normalization function with the lowest mode number or the lowest priority number
specified within the profile will be
updated such as to produce -Log10 per-residue E-values.
If the second argument is omitted, the output
consists of a header line containing the normalization parameters
followed by a modified score list,
showing
.IR "score rank" ,
.IR "original raw scores" ,
.I log-cumulative frequencies
and
corresponding
.I normalized scores
next to each other.
.PP
Note that this program implements the significance estimation procedure for profile
match scores described in Hofmann & Bucher (1995).
It has been used for the calculation of the normalization parameters of
all profiles in the
.SM PROSITE
database.
.\" ------------------------------------------------
.\" Options section
.\" ------------------------------------------------
.SH OPTIONS
.\" --- ms_file ---
.TP
.I score_list
Input score list.
.br
The file must contain a sorted list of scores. The first field
of each line is considered as being a score, all other fields on the same line are ignored.
The different fields of each line should be delimited by whitespaces.
If the filename is replaced by a
.RB ' \- ',
.B pfscale
will read the score list from
.BR stdin .
.\" --- profile ---
.TP
.I profile
Optional profile file.
.br
If a filename is specified, the profile will be parsed and
either the lowest priority mode or the mode number specified with option
.B \-M
will be scaled. All cut-off levels which use the specified mode number will also
be updated.
.\" --- h ---
.TP
.B \-h
Display usage help text.
.\" --- l ---
.TP
.B \-l
Remove output line length limit. Individual lines of the output profile
can exceed a length of 132 characters, removing the need to wrap them over several lines.
.\" --- L ---
.TP
.BI \-L\ log_base
Logarithmic base of the parameters of the estimated extreme-value
distribution.
The parameters reported by
.B pfscale
are expressed as logarithms
and thus can be inserted directly into a linear normalization function
defined in a generalized profile.
.br
Default: 10
.\" --- M ---
.TP
.BI \-M\ mode_nb
Mode number to scale.
.br
Defines which mode number (and implicitly which cut-off level) of the
input
.SM PROSITE
profile should be scaled. This overrides the default behaviour of scaling
only the normalization mode with the lowest priority (or lowest mode number).
All cut-off levels defined in the profile as using this mode number (via the
.I MODE
keyword) will be updated as well.
.\" --- N ---
.TP
.BI \-N\ db_size
Size of the database from which the input score list was derived.
The searched database is typically a shuffled version
of a real protein or nucleotide sequence database.
.br
Default: 14147368 (size of
.SM SWISS-PROT
release 30 and shuffled derivatives of it).
.\" --- P ---
.TP
.BI \-P\ upper_limit
Upper threshold of the probability range to which the extreme-value
distribution will be fitted.
For instance: if
.IR N =10'000'000
and
.IR P =0.0001
then profile match scores below rank 1000
in the sorted input list
(corresponding to occurrence probabilities > 0.0001)
will be ignored.
.br
Default: 0.0001
.\" --- Q ---
.TP
.BI \-Q\ lower_limit
Lower threshold of the probability range to which the extreme-value
distribution will be fitted.
For instance: if
.IR N =10'000'000
and
.IR Q =0.000001
then profile match scores above rank 10 in the sorted input list
(corresponding to occurrence probabilities < 0.000001)
will be ignored.
.br
Default: 0.000001
.\" ------------------------------------------------
.\" Parameters section
.\" ------------------------------------------------
.SH PARAMETERS
.TP
Note:
for backwards compatibility, release 2.3 of the
.B pftools
package will parse the version 2.2 style parameters, but these are
.I deprecated
and the corresponding option (refer to the
.I options
section) should be used instead.
.TP
L=#
Logarithmic base.
.br
Use option
.B \-L
instead.
.TP
M=#
Mode number.
.br
Use option
.B \-M
instead.
.TP
N=#
Database size.
.br
Use option
.B \-N
instead.
.TP
P=#
Upper probability threshold.
.br
Use option
.B \-P
instead.
.TP
Q=#
Lower probability threshold.
.br
Use option
.B \-Q
instead.
.\" ------------------------------------------------
.\" Examples section
.\" ------------------------------------------------
.SH EXAMPLES
.TP
(1)
.B pfsearch
\-fr \-C 200 sh3.prf shuffle20.seq |
.B sort
\-nr |
.B pfscale
\-P 0.0001 \-Q 0.000001 \-
.IP
derives score-normalization parameters for the SH3 domain profile
in file
.RB ' sh3.prf '.
The file
.RB ' shuffle20.seq '
contains a window-shuffled derivative of
.SM SWISS-PROT
release 30 in Pearson/Fasta format (window-size 20).
Note that the implicit default of
.I N
corresponds to the size of this database and thus
needs not to be specified on the command line.
The cut-off value 200 for the
.BR pfsearch (1)
option
.B \-C
will produce about 2000 matches completely covering the range defined by
the command line parameters
.B \-P
and
.B \-Q
of
.BR pfscale .
A suitable cut-off value has to be guessed in advance
by computing a few optimal alignment scores for random sequences.
.\" ------------------------------------------------
.\" Exit code section
.\" ------------------------------------------------
.SH EXIT CODE
.LP
On successful completion of its task,
.B pfscale
will return an exit code of 0. If an error occurs, a diagnostic message will be
output on standard error and the exit code will be different from 0. When conflicting
options where passed to the program but the task could nevertheless be completed, warnings
will be issued on standard error.
.\" ------------------------------------------------
.\" Notes section
.\" ------------------------------------------------
.SH NOTES
.TP
(1)
The current version of
.B pfscale
does not yet support the
.BR xpsa (5)
output format produced by
.BR pfscan "(1) or " pfsearch (1).
The score list should therefore be generated without the
.BR pfscan "(1) and " pfsearch (1)
option
.BR \-k .
.\" ------------------------------------------------
.\" References section
.\" ------------------------------------------------
.SH REFERENCES
.LP
Hofmann K & Bucher P. (1995).
.I The FHA-domain: a nuclear signalling domain found in protein kinases and transcription factors.
Trends Biochem. Sci.
.BR 20 :47-349.
.\" ------------------------------------------------
.\" See also section
.\" ------------------------------------------------
.SH "SEE ALSO"
.BR pfsearch (1),
.BR pfscan (1),
.BR xpsa (5)
.\" ------------------------------------------------
.\" Author section
.\" ------------------------------------------------
.SH AUTHOR
The
.B pftools
package was developed by Philipp Bucher.
.br
Any comments or suggestions should be addressed to <pftools@sib.swiss>.
|