1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537
|
.\"
.\" $Id: pfmake.1,v 1.2 2003/08/11 12:09:14 vflegel Exp $
.\" Copyright (c) 2003 SIB Swiss Institute of Bioinformatics <pftools@sib.swiss>
.\" Process this file with
.\" groff -man -Tascii <name>
.\" for ascii output or
.\" groff -man -Tps <name>
.\" for postscript output
.\"
.TH PFMAKE 1 "August 2003" "pftools 2.3" "pftools"
.\" ------------------------------------------------
.\" Name section
.\" ------------------------------------------------
.SH NAME
pfmake \- generate a profile from a multiple sequence alignment
.\" ------------------------------------------------
.\" Synopsis section
.\" ------------------------------------------------
.SH SYNOPSIS
.TP 10
.B pfmake
[
.B \-0123abcehlms
] [
.B \-E
.I gap_extend
] [
.B \-F
.I score_multiplier
] [
.B \-G
.I gap_open
] [
.B \-H
.I high_init/term
] [
.B \-I
.I gap_increment
] [
.B \-L
.I low_init/term
] [
.B \-M
.I gap_multiplier
] [
.B \-S
.I matrix_multiplier
] [
.B \-T
.I gap_region
] [
.B \-X
.I gap_excision
] [
.I ms_file
|
.B \-
]
.I score_matrix
[
.I profile
] [
.I parameters
]
.\" ------------------------------------------------
.\" Description section
.\" ------------------------------------------------
.SH DESCRIPTION
.B pfmake
generates a
.SM PROSITE
profile from a multiple sequence alignment using methods
described by Gribskov
.IR "et al" .
(1990), Luethy
.IR "et al" .
(1994), and Thompson
.IR "et al" .
(1994), with modifications to exploit the features of the new profile format.
The file containing the multiple sequence alignment
.RI ( ms_file )
must be either in MSF format as generated by
.SM GCG
programs or by
.B readseq
(checksums are ignored) or in MSA format as created by
.BR psa2msa (1).
If
.RB ' \- '
is specified instead of a filename, the multiple sequence alignment is read
from the standard input. The
.I score_matrix
file must also be in
.SM GCG
format.
.PP
If an already existing
.I profile
is given as input via the third optional argument, the parameters of the
DISJOINT, NORMALIZATION and CUT_OFF blocks will be read from input, all other
profile parameters will be recalculated.
Header and footer lines outside the matrix block will also be transferred
from input to output.
.PP
If no input profile is given, the disjointness definition will be set to
PROTECT with borders leaving short unprotected tails (maximum 5 positions)
at the beginning and at the end of the profile. Furthermore, one normalization mode
.RI ( n_score \ =\ raw_score \ /\ F ,
where
.I F
is the output score multiplier,
.I see
below), and two cut-off values (level 0: 8.5, level -1: 6.5)
will be defined.
.\" ------------------------------------------------
.\" Options section
.\" ------------------------------------------------
.SH OPTIONS
.\" --- ms_file ---
.TP
.I ms_file
Input multiple sequence alignment.
.br
The content of the file must be either in MSF or in MSA format.
If the filename is replaced by a
.RB ' \- ',
.B pfmake
will read the input alignment from
.BR stdin .
.\" --- score_matrix ---
.TP
.I score_matrix
Residue score matrix file.
.br
Contains the substitution scores for all pairs of residues
of the sequence alphabet. The file must be in
.SM GCG
format.
.\" --- profile ---
.TP
.I profile
Optional profile file.
.br
If a filename is specified, the profile will be parsed and
those parameters mentioned in the
.I description
section will be kept for the computation of the output profile.
.\" --- 0 ---
.TP
.B \-0
Global alignment mode.
.br
Initiation (termination) at low cost
is possible only if the alignment starts at the beginning (end) of the
profile and at the beginning (end) of the sequence.
.\" --- 1 ---
.TP
.B \-1
Domain global alignment mode.
.br
Initiation (termination) at low cost
is possible only at the beginning (end) of the profile; it may
start and end at any position within the sequence.
.\" --- 2 ---
.TP
.B \-2
Semi-global alignment mode.
.br
Initiation (termination) at low cost
is possible if the alignment starts either at the beginning (end) of the
profile or at the beginning (end) of the sequences.
.br
This is the default alignment mode.
.\" --- 3 ---
.TP
.B \-3
Local alignment mode.
.br
Initiation (termination) at low cost
is possible anywhere. The high-cost initiation/termination score
(parameter
.IR H )
is meaningless.
.\" --- a ---
.TP
.B \-a
Causes
.B pfsearch
to weight gaps asymmetrically, as in Gribskov
.IR "et al" .
(1990).
.\" --- b ---
.TP
.B \-b
Block profile mode.
.br
By imposing additional constraints on the placement of
insertions and deletions, this mode produces profiles that favor alignments
with insertions and deletions positioned symmetrically around a few
positions. For each gap region a gap center is defined which
usually corresponds to the place where gap excision has been applied
(see parameter
.IR X ).
If no gap excision has been applied, the position is chosen such as to
maximize the sum of deletion opening events before, and
deletion closing events after the gap center.
Within a given gap region
reduced deletion opening penalties are offered only before,
reduced deletion closing penalties only after,
and reduced insertion penalties only at the center.
.br
This option is incompatible with options
.B \-a
and
.B \-e
and
automatically disables them.
.\" --- c ---
.TP
.B \-c
Circular profile.
.br
The topology of the profile is declared as
circular. The first and the last insert positions are merged
by retaining the higher value of each parameter type.
.\" --- e ---
.TP
.B \-e
Enables endgap-weighting mode as implemented in the
.SM GCG
program
.BR ProfileMake .
Endgaps in the multiple sequence alignment will be interpreted
as deletions relative to the other sequences and thus be
considered for the delineation of gap regions.
The default is no endgap weighting as introduced by Thompson
.IR "et al" .
(1994) in the program
.BR ProfileWeight .
.\" --- h ---
.TP
.B \-h
Display usage help text.
.\" --- l ---
.TP
.B \-l
Remove output line length limit. Individual lines of the output profile
can exceed a length of 132 characters, removing the need to wrap them over several lines.
.\" --- m ---
.TP
.B \-m
Input multiple sequence alignment is in MSA format.
.\" --- s ---
.TP
.B \-s
Causes
.B pfsearch
to weight gaps symmetrically (default mode). The
initial gap opening scores
.RI ( MD , \ MI )
computed from the
maximal gap length and the command-line parameters
.IR E , \ G , \ I ,
and
.IR M ,
will be divided by two and the resulting value will be assigned to both
gap opening and gap closing scores
.RI ( MI , \ IM , \ MD , \ DM ).
.\" --- E ---
.TP
.BI \-E\ gap_extend
Gap extension penalty.
.I See
Gribskov
.IR "et al" .
(1990).
.br
Default: 0.2 (appropriate for 1/3 bit-scaled blosum45 matrix)
.\" --- F ---
.TP
.BI \-F\ score_multiplier
Output score multiplier.
.br
On output, all profile scores are multiplied by
this factor and rounded to nearest integers.
.br
Default: 100
.\" --- G ---
.TP
.BI \-G\ gap_open
Gap opening penalty.
.I See
Gribskov
.IR "et al" .
(1990).
.br
Default: 2.1 (appropriate for 1/3 bit-scaled blosum45 matrix)
.\" --- H ---
.TP
.BI \-H\ high_init/term
High-cost initiation/termination score.
.br
This score will be applied to
all external and internal initiation and termination scores corresponding
to path matrix positions where initiation or termination at low cost is not
possible according to the alignment mode specified.
.br
Default: * (low-value)
.\" --- I ---
.TP
.BI \-I\ gap_increment
Gap penalty multiplier increment.
.I See
Gribskov
.IR "et al" .
(1990).
.br
Default: 0.1
.\" --- L ---
.TP
.BI \-L\ low_init/term
Low-cost initiation/termination score.
.br
This score will be applied to all external and internal initiation and
termination scores corresponding to path matrix positions where
initiation or termination at low cost is possible according to the alignment
mode specified.
.br
Default: 0
.\" --- M ---
.TP
.BI \-M\ gap_multiplier
Maximum gap penalty multiplier.
.I See
Gribskov
.IR "et al" .
(1990).
Default: 0.333
.\" --- S ---
.TP
.BI \-S\ matrix_multiplier
Score matrix multiplier.
.br
On input, the numbers of the score matrix are multiplied by this factor.
.br
Default: 0.1
.\" --- T ---
.TP
.BI \-T\ gap_region
Gap region threshold.
.br
This is the minimal fraction of gap characters a column of the multiple sequence
alignment must contain in order to be considered part of a gap region.
.br
Default: 0.01
.\" --- X ---
.TP
.BI \-X\ gap_excision
Gap excision threshold.
.br
This is the minimal fraction of non-gap characters a column of the multiple sequence
alignment must contain in order to be converted into a match position. The
.I IM
and
.I MI
transition scores of insert positions corresponding to excised columns
are set to zero; the other parameters remain unchanged.
.br
Default: 0.5
.\" ------------------------------------------------
.\" Parameters section
.\" ------------------------------------------------
.SH PARAMETERS
.TP
Note:
for backwards compatibility, release 2.3 of the
.B pftools
package will parse the version 2.2 style parameters, but these are
.I deprecated
and the corresponding option (refer to the
.I options
section) should be used instead.
.TP
E=#
Gap extension penalty.
.br
Use option
.B \-E
instead.
.TP
F=#
Output score multiplier.
.br
Use option
.B \-F
instead.
.TP
G=#
Gap opening penalty
.br
Use option
.B \-G
instead.
.TP
H=#
High cost initiation/termination score.
.br
Use option
.B \-H
instead.
.TP
I=#
Gap penalty multiplier increment.
.br
Use option
.B \-I
instead.
.TP
L=#
Low cost initiation/termination score.
.br
Use option
.B \-L
instead.
.TP
M=#
maximum gap penalty multiplier.
.br
Use option
.B \-M
instead.
.TP
S=#
Score matrix multiplier.
.br
Use option
.B \-S
instead.
.TP
T=#
Gap region threshold.
.br
Use option
.B \-T
instead.
.TP
X=#
Gap excision threshold.
.br
Use option
.B \-X
instead.
.\" ------------------------------------------------
.\" Examples section
.\" ------------------------------------------------
.SH EXAMPLES
.TP
(1)
.B pfmake
\-b1 \-H 0.6 sh3.msf blosum45.cmp > sh3_block.prf
.IP
Generates a domain-global block profile from a multiple alignment
of SH3 domains using the blosum45 matrix.
The file
.RI ' sh3.msf '
contains a multiple alignment of 20 SH3 domains from
.SM SWISS-PROT
release 32 including sequence weights.
The file
.RI ' blosum45.cmp '
contains a 1/3 bits-scaled blosum45 matrix in
.SM GCG
format.
.br
Note that fragment matches (alignments to parts of the profile) are not
prohibited but penalized by the option
.B \-H
.IR 0.6 .
.\" ------------------------------------------------
.\" Exit code section
.\" ------------------------------------------------
.SH EXIT CODE
.LP
On successful completion of its task,
.B pfmake
will return an exit code of 0. If an error occurs, a diagnostic message will be
output on standard error and the exit code will be different from 0. When conflicting
options where passed to the program but the task could nevertheless be completed, warnings
will be issued on standard error.
.\" ------------------------------------------------
.\" References section
.\" ------------------------------------------------
.SH REFERENCES
.LP
Bucher P, Karplus K, Moeri N & Hofmann, K. (1996).
.I A flexible motif search
.I technique based on generalized
.I profiles.
Comput. Chem.
.BR 20 :3-24.
.LP
Gribskov M, Luethy R & Eisenberg D (1990).
.I Profile analysis.
Meth. Enzymol.
.BR 183 :146-159.
.LP
Luethy R, Xenarios I & Bucher P (1994).
.I Improving the sensitivity of the
.I sequence profile method.
Prot. Sci.
.BR 3 :139-146.
.LP
Thompson JD, Higgins DG & Gibson TJ (1994)
.I Improved sensitivity of profile
.I searches through the
.I use of sequence weights
.I and gap excision.
Comput. Appl. Biosci.
.BR 10 :19-29.
.\" ------------------------------------------------
.\" See also section
.\" ------------------------------------------------
.SH "SEE ALSO"
.BR pfsearch (1),
.BR pfscan (1),
.BR psa2msa (1),
.BR psa (5),
.BR xpsa (5)
.\" ------------------------------------------------
.\" Author section
.\" ------------------------------------------------
.SH AUTHOR
The
.B pftools
package was developed by Philipp Bucher.
.br
Any comments or suggestions should be addressed to <pftools@sib.swiss>.
|