1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197
|
.\"
.\" $Id: psa.5,v 1.1 2003/05/12 11:51:45 vflegel Exp $
.\" Copyright (c) 2003 SIB Swiss Institute of Bioinformatics <pftools@sib.swiss>
.\" Process this file with
.\" groff -man -Tascii <name>
.\" for ascii output or
.\" groff -man -Tps <name>
.\" for postscript output
.\"
.TH PSA 5 "April 2003" "pftools 2.3" "File formats"
.\" ------------------------------------------------
.\" Name section
.\" ------------------------------------------------
.SH NAME
psa \- biological sequence alignment file format
.\" ------------------------------------------------
.\" Description section
.\" ------------------------------------------------
.SH DESCRIPTION
.B psa
is an output format used by the
.B pftools
package to describe alignments between biological sequences (DNA or protein) and
.I PROSITE
profiles.
.PP
.B psa
is apparented to the widely used biological sequence file format
.IR fasta .
Nevertheless it does not only describe a biological sequence, it is especially used to include
information of alignments between a motif descriptor like a
.I PROSITE
profile and a given sequence. This information is included in the header and reflected
in the structure of the sequence following the header line.
.\" ------------------------------------------------
.\" Syntax section
.\" ------------------------------------------------
.SH SYNTAX
Each sequence in a
.B psa
alignment file or output must be preceded by a
.I fasta
header line.
.br
The general syntax of such a
.I fasta
header line is as follows:
.sp
.RS
.BI > seq_id
.RI "[ " free_text " ]"
.RE
.sp
The header must start with a
.RB ' > '
character which is directly followed by the
.I seq_id
field. This field is interpreted by most programs as the sequence's
.I identifier
and/or
.I accession
number. It ends at the first encountered whitespace character.
.br
The
.B pftools
programs will use the
.I free_text
to add information about the match score, position and description of the sequence or motif.
Please refer to the man page of the corresponding programs for further information about
the output formats.
.br
The header can only extend over one line. The following lines up to a new line starting with a
.RB ' > '
character or the end of the file are interpreted as sequence data.
.sp
The line following the header, starts the alignment data between a sequence and a
.I PROSITE
profile. This data can span over several lines of different length.
.br
The data is formed by
.I upper
or
.IR lower -case
characters of the corresponding sequence alphabet (DNA or protein).
The gap characters
.RB ' . "' and '" - '
are also supported.
.br
The alignment always has at least the length of the matching profile. Insertions or deletions
detected during the motif/sequence alignment step will vary the length of the data reported,
and can be identified using the following conventions:
.RS
.\" --- upper-case character ---
.TP
.I upper-case character
Any upper-case character of the sequence alphabet identifies a
.I match
position between the sequence and the motif descriptor.
.\" --- lower-case character ---
.TP
.I lower-case character
A lower-case character of the sequence alphabet is used to symbolize an
.I insertion
in the sequence compared to the motif descriptor.
.\" --- dash '-' character ---
.TP
.I '-' (dash) character
A
.RB ' - '
character in the output identifies the presence of a
.I deletion
in the sequence compared to the motif descriptor.
.RE
.\" ------------------------------------------------
.\" Examples section
.\" ------------------------------------------------
.SH EXAMPLES
.TP
(1)
>YD28_SCHPO 556 pos. 291 - 332 sp|Q10256|YD28_SCHPO
.br
PTDPGlnsKIAQLVSMGFDPLEAAQALDAANGDLDVAASFLL--
.br
This is an example of the output produced by
.BR pfsearch (1)
using the '-x' (i.e.
.B psa
output) option. The first line starting with the
.RB ' > '
character is the
.I fasta
header. It also contains information about the raw score of the alignment as well as its
position in the input sequence.
.br
On the next line you find the alignment proper. Starting at position 6, we can find an
.I insertion
of the
.RI ' lns '
residues in the sequence compared to the motif. The last two positions of the motif are
not present in the sequence (i.e. they are
.IR deleted ).
This is indicated by the presence of two
.RB ' - '
(dash) characters at the end of the alignment.
.RE
.\" ------------------------------------------------
.\" Notes section
.\" ------------------------------------------------
.SH "NOTES"
.TP
(1)
The
.BR xpsa (5)
format defines a more strict syntax of the header line, allowing the exchange of information between
different sequence analysis tools. It uses
.IR keyword = value
pairs to annotate the current match between a sequence and a motif descriptor. This syntax can be
easily parsed and extended, according to the needs of bioinformatic tools.
.RE
.TP
(2)
The current implementation of the
.B pftools
package does not use the
.RB ' . '
(dot) character in the
.B psa
output. Nevertheless
.BR psa2msa (1)
will read it and interpret it in the same manner as the
.RB ' - '
(dash) character.
.RE
.\" ------------------------------------------------
.\" See also section
.\" ------------------------------------------------
.SH "SEE ALSO"
.BR xpsa (5),
.BR pfsearch (1),
.BR pfscan (1),
.BR pfw (1),
.BR pfmake (1),
.BR psa2msa (1)
.\" ------------------------------------------------
.\" Author section
.\" ------------------------------------------------
.SH "AUTHOR"
This manual page was originally written by Volker Flegel.
.br
The
.B pftools
package was developed by Philipp Bucher.
.br
Any comments or suggestions should be addressed to <pftools@sib.swiss>.
|