File: emma.1e

package info (click to toggle)
emboss 6.6.0%2Bdfsg-12
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 571,584 kB
  • sloc: ansic: 460,579; java: 29,383; perl: 13,573; sh: 12,753; makefile: 3,294; csh: 706; asm: 351; xml: 239; pascal: 237; modula3: 8
file content (214 lines) | stat: -rw-r--r-- 12,449 bytes parent folder | download | duplicates (8)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
'\" t
.\"     Title: EMMA
.\"    Author: Debian Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>
.\" Generator: DocBook XSL Stylesheets v1.76.1 <http://docbook.sf.net/>
.\"      Date: 05/11/2012
.\"    Manual: EMBOSS Manual for Debian
.\"    Source: EMBOSS 6.4.0
.\"  Language: English
.\"
.TH "EMMA" "1e" "05/11/2012" "EMBOSS 6.4.0" "EMBOSS Manual for Debian"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
emma \- Multiple sequence alignment (ClustalW wrapper)
.SH "SYNOPSIS"
.HP \w'\fBemma\fR\ 'u
\fBemma\fR \fB\-sequence\ \fR\fB\fIseqall\fR\fR [\fB\-onlydend\ \fR\fB\fItoggle\fR\fR] \fB\-dend\ \fR\fB\fItoggle\fR\fR \fB\-dendfile\ \fR\fB\fIinfile\fR\fR [\fB\-slow\ \fR\fB\fItoggle\fR\fR] \fB\-pwmatrix\ \fR\fB\fIlist\fR\fR \fB\-pwdnamatrix\ \fR\fB\fIlist\fR\fR \fB\-usermatrix\ \fR\fB\fIvariable\fR\fR \fB\-pairwisedatafile\ \fR\fB\fIinfile\fR\fR \fB\-matrix\ \fR\fB\fIlist\fR\fR \fB\-usermamatrix\ \fR\fB\fIvariable\fR\fR \fB\-dnamatrix\ \fR\fB\fIlist\fR\fR \fB\-umamatrix\ \fR\fB\fIvariable\fR\fR \fB\-mamatrixfile\ \fR\fB\fIinfile\fR\fR \fB\-pwgapopen\ \fR\fB\fIfloat\fR\fR \fB\-pwgapextend\ \fR\fB\fIfloat\fR\fR \fB\-ktup\ \fR\fB\fIinteger\fR\fR \fB\-gapw\ \fR\fB\fIinteger\fR\fR \fB\-topdiags\ \fR\fB\fIinteger\fR\fR \fB\-window\ \fR\fB\fIinteger\fR\fR \fB\-nopercent\ \fR\fB\fIboolean\fR\fR [\fB\-gapopen\ \fR\fB\fIfloat\fR\fR] [\fB\-gapextend\ \fR\fB\fIfloat\fR\fR] [\fB\-endgaps\ \fR\fB\fIboolean\fR\fR] [\fB\-gapdist\ \fR\fB\fIinteger\fR\fR] \fB\-norgap\ \fR\fB\fIboolean\fR\fR \fB\-hgapres\ \fR\fB\fIstring\fR\fR \fB\-nohgap\ \fR\fB\fIboolean\fR\fR [\fB\-maxdiv\ \fR\fB\fIinteger\fR\fR] \fB\-outseq\ \fR\fB\fIseqoutset\fR\fR \fB\-dendoutfile\ \fR\fB\fIoutfile\fR\fR
.HP \w'\fBemma\fR\ 'u
\fBemma\fR \fB\-help\fR
.SH "DESCRIPTION"
.PP
\fBemma\fR
is a command line program from EMBOSS (\(lqthe European Molecular Biology Open Software Suite\(rq)\&. It is part of the "Alignment:Multiple" command group(s)\&.
.SH "OPTIONS"
.SS "Input section"
.PP
\fB\-sequence\fR \fIseqall\fR
.RS 4
.RE
.PP
\fB\-onlydend\fR \fItoggle\fR
.RS 4
Default value: N
.RE
.PP
\fB\-dend\fR \fItoggle\fR
.RS 4
Default value: N
.RE
.PP
\fB\-dendfile\fR \fIinfile\fR
.RS 4
.RE
.PP
\fB\-slow\fR \fItoggle\fR
.RS 4
A distance is calculated between every pair of sequences and these are used to construct the dendrogram which guides the final multiple alignment\&. The scores are calculated from separate pairwise alignments\&. These can be calculated using 2 methods: dynamic programming (slow but accurate) or by the method of Wilbur and Lipman (extremely fast but approximate)\&. The slow\-accurate method is fine for short sequences but will be VERY SLOW for many (e\&.g\&. >100) long (e\&.g\&. >1000 residue) sequences\&. Default value: Y
.RE
.SS "Pairwise align options"
.PP
\fB\-pwmatrix\fR \fIlist\fR
.RS 4
The scoring table which describes the similarity of each amino acid to each other\&. There are three \*(Aqin\-built\*(Aq series of weight matrices offered\&. Each consists of several matrices which work differently at different evolutionary distances\&. To see the exact details, read the documentation\&. Crudely, we store several matrices in memory, spanning the full range of amino acid distance (from almost identical sequences to highly divergent ones)\&. For very similar sequences, it is best to use a strict weight matrix which only gives a high score to identities and the most favoured conservative substitutions\&. For more divergent sequences, it is appropriate to use \*(Aqsofter\*(Aq matrices which give a high score to many other frequent substitutions\&. 1) BLOSUM (Henikoff)\&. These matrices appear to be the best available for carrying out data base similarity (homology searches)\&. The matrices used are: Blosum80, 62, 45 and 30\&. 2) PAM (Dayhoff)\&. These have been extremely widely used since the late \*(Aq70s\&. We use the PAM 120, 160, 250 and 350 matrices\&. 3) GONNET \&. These matrices were derived using almost the same procedure as the Dayhoff one (above) but are much more up to date and are based on a far larger data set\&. They appear to be more sensitive than the Dayhoff series\&. We use the GONNET 40, 80, 120, 160, 250 and 350 matrices\&. We also supply an identity matrix which gives a score of 1\&.0 to two identical amino acids and a score of zero otherwise\&. This matrix is not very useful\&. Default value: b
.RE
.PP
\fB\-pwdnamatrix\fR \fIlist\fR
.RS 4
The scoring table which describes the scores assigned to matches and mismatches (including IUB ambiguity codes)\&. Default value: i
.RE
.PP
\fB\-usermatrix\fR \fIvariable\fR
.RS 4
.RE
.PP
\fB\-pairwisedatafile\fR \fIinfile\fR
.RS 4
.RE
.SS "Matrix options"
.PP
\fB\-matrix\fR \fIlist\fR
.RS 4
This gives a menu where you are offered a choice of weight matrices\&. The default for proteins is the PAM series derived by Gonnet and colleagues\&. Note, a series is used! The actual matrix that is used depends on how similar the sequences to be aligned at this alignment step are\&. Different matrices work differently at each evolutionary distance\&. There are three \*(Aqin\-built\*(Aq series of weight matrices offered\&. Each consists of several matrices which work differently at different evolutionary distances\&. To see the exact details, read the documentation\&. Crudely, we store several matrices in memory, spanning the full range of amino acid distance (from almost identical sequences to highly divergent ones)\&. For very similar sequences, it is best to use a strict weight matrix which only gives a high score to identities and the most favoured conservative substitutions\&. For more divergent sequences, it is appropriate to use \*(Aqsofter\*(Aq matrices which give a high score to many other frequent substitutions\&. 1) BLOSUM (Henikoff)\&. These matrices appear to be the best available for carrying out data base similarity (homology searches)\&. The matrices used are: Blosum80, 62, 45 and 30\&. 2) PAM (Dayhoff)\&. These have been extremely widely used since the late \*(Aq70s\&. We use the PAM 120, 160, 250 and 350 matrices\&. 3) GONNET \&. These matrices were derived using almost the same procedure as the Dayhoff one (above) but are much more up to date and are based on a far larger data set\&. They appear to be more sensitive than the Dayhoff series\&. We use the GONNET 40, 80, 120, 160, 250 and 350 matrices\&. We also supply an identity matrix which gives a score of 1\&.0 to two identical amino acids and a score of zero otherwise\&. This matrix is not very useful\&. Alternatively, you can read in your own (just one matrix, not a series)\&. Default value: b
.RE
.PP
\fB\-usermamatrix\fR \fIvariable\fR
.RS 4
.RE
.PP
\fB\-dnamatrix\fR \fIlist\fR
.RS 4
This gives a menu where a single matrix (not a series) can be selected\&. Default value: i
.RE
.PP
\fB\-umamatrix\fR \fIvariable\fR
.RS 4
.RE
.PP
\fB\-mamatrixfile\fR \fIinfile\fR
.RS 4
.RE
.SS "Additional section"
.SS "Slow align options"
.PP
\fB\-pwgapopen\fR \fIfloat\fR
.RS 4
The penalty for opening a gap in the pairwise alignments\&. Default value: 10\&.0
.RE
.PP
\fB\-pwgapextend\fR \fIfloat\fR
.RS 4
The penalty for extending a gap by 1 residue in the pairwise alignments\&. Default value: 0\&.1
.RE
.SS "Fast align options"
.PP
\fB\-ktup\fR \fIinteger\fR
.RS 4
This is the size of exactly matching fragment that is used\&. INCREASE for speed (max= 2 for proteins; 4 for DNA), DECREASE for sensitivity\&. For longer sequences (e\&.g\&. >1000 residues) you may need to increase the default\&. Default value: @($(acdprotein)?1:2)
.RE
.PP
\fB\-gapw\fR \fIinteger\fR
.RS 4
This is a penalty for each gap in the fast alignments\&. It has little affect on the speed or sensitivity except for extreme values\&. Default value: @($(acdprotein)?3:5)
.RE
.PP
\fB\-topdiags\fR \fIinteger\fR
.RS 4
The number of k\-tuple matches on each diagonal (in an imaginary dot\-matrix plot) is calculated\&. Only the best ones (with most matches) are used in the alignment\&. This parameter specifies how many\&. Decrease for speed; increase for sensitivity\&. Default value: @($(acdprotein)?5:4)
.RE
.PP
\fB\-window\fR \fIinteger\fR
.RS 4
This is the number of diagonals around each of the \*(Aqbest\*(Aq diagonals that will be used\&. Decrease for speed; increase for sensitivity\&. Default value: @($(acdprotein)?5:4)
.RE
.PP
\fB\-nopercent\fR \fIboolean\fR
.RS 4
Default value: N
.RE
.SS "Gap options"
.PP
\fB\-gapopen\fR \fIfloat\fR
.RS 4
The penalty for opening a gap in the alignment\&. Increasing the gap opening penalty will make gaps less frequent\&. Default value: 10\&.0
.RE
.PP
\fB\-gapextend\fR \fIfloat\fR
.RS 4
The penalty for extending a gap by 1 residue\&. Increasing the gap extension penalty will make gaps shorter\&. Terminal gaps are not penalised\&. Default value: 5\&.0
.RE
.PP
\fB\-endgaps\fR \fIboolean\fR
.RS 4
End gap separation: treats end gaps just like internal gaps for the purposes of avoiding gaps that are too close (set by \*(Aqgap separation distance\*(Aq)\&. If you turn this off, end gaps will be ignored for this purpose\&. This is useful when you wish to align fragments where the end gaps are not biologically meaningful\&. Default value: Y
.RE
.PP
\fB\-gapdist\fR \fIinteger\fR
.RS 4
Gap separation distance: tries to decrease the chances of gaps being too close to each other\&. Gaps that are less than this distance apart are penalised more than other gaps\&. This does not prevent close gaps; it makes them less frequent, promoting a block\-like appearance of the alignment\&. Default value: 8
.RE
.PP
\fB\-norgap\fR \fIboolean\fR
.RS 4
Residue specific penalties: amino acid specific gap penalties that reduce or increase the gap opening penalties at each position in the alignment or sequence\&. As an example, positions that are rich in glycine are more likely to have an adjacent gap than positions that are rich in valine\&. Default value: N
.RE
.PP
\fB\-hgapres\fR \fIstring\fR
.RS 4
This is a set of the residues \*(Aqconsidered\*(Aq to be hydrophilic\&. It is used when introducing Hydrophilic gap penalties\&. Default value: GPSNDQEKR
.RE
.PP
\fB\-nohgap\fR \fIboolean\fR
.RS 4
Hydrophilic gap penalties: used to increase the chances of a gap within a run (5 or more residues) of hydrophilic amino acids; these are likely to be loop or random coil regions where gaps are more common\&. The residues that are \*(Aqconsidered\*(Aq to be hydrophilic are set by \*(Aq\-hgapres\*(Aq\&. Default value: N
.RE
.PP
\fB\-maxdiv\fR \fIinteger\fR
.RS 4
This switch, delays the alignment of the most distantly related sequences until after the most closely related sequences have been aligned\&. The setting shows the percent identity level required to delay the addition of a sequence; sequences that are less identical than this level to any other sequences will be aligned later\&. Default value: 30
.RE
.SS "Output section"
.PP
\fB\-outseq\fR \fIseqoutset\fR
.RS 4
.RE
.PP
\fB\-dendoutfile\fR \fIoutfile\fR
.RS 4
.RE
.SH "BUGS"
.PP
Bugs can be reported to the Debian Bug Tracking system (http://bugs\&.debian\&.org/emboss), or directly to the EMBOSS developers (http://sourceforge\&.net/tracker/?group_id=93650&atid=605031)\&.
.SH "SEE ALSO"
.PP
emma is fully documented via the
\fBtfm\fR(1)
system\&.
.SH "AUTHOR"
.PP
\fBDebian Med Packaging Team\fR <\&debian\-med\-packaging@lists\&.alioth\&.debian\&.org\&>
.RS 4
Wrote the script used to autogenerate this manual page\&.
.RE
.SH "COPYRIGHT"
.br
.PP
This manual page was autogenerated from an Ajax Control Definition of the EMBOSS package\&. It can be redistributed under the same terms as EMBOSS itself\&.
.sp