1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275
|
.TH "hmm2search" 1 "Oct 2003" "HMMER 2.3.2" "HMMER Manual"
.SH NAME
.TP
hmm2search - search a sequence database with a profile HMM
.SH SYNOPSIS
.B hmm2search
.I [options]
.I hmmfile
.I seqfile
.SH DESCRIPTION
.B hmm2search
reads an HMM from
.I hmmfile
and searches
.I seqfile
for significantly similar sequence matches.
.PP
.I seqfile
will be looked for first in the current working directory,
then in a directory named by the environment variable
.I BLASTDB.
This lets users use existing BLAST databases, if BLAST
has been configured for the site.
.PP
.B hmm2search
may take minutes or even hours to run, depending
on the size of the sequence database. It is a good
idea to redirect the output to a file.
.PP
The output consists of four sections: a ranked list
of the best scoring sequences, a ranked list of the
best scoring domains, alignments for all the best scoring
domains, and a histogram of the scores.
A sequence score may be higher than a domain score for
the same sequence if there is more than one domain in the sequence;
the sequence score takes into account all the domains.
All sequences scoring above the
.I -E
and
.I -T
cutoffs are shown in the first list, then
.I every
domain found in this list is
shown in the second list of domain hits.
If desired, E-value and bit score thresholds may also be applied
to the domain list using the
.I --domE
and
.I --domT
options.
.SH OPTIONS
.TP
.B -h
Print brief help; includes version number and summary of
all options, including expert options.
.TP
.BI -A " <n>"
Limits the alignment output to the
.I <n>
best scoring domains.
.B -A0
shuts off the alignment output and can be used to reduce
the size of output files.
.TP
.BI -E " <x>"
Set the E-value cutoff for the per-sequence ranked hit list to
.I <x>,
where
.I <x>
is a positive real number. The default is 10.0. Hits with E-values
better than (less than) this threshold will be shown.
.TP
.BI -T " <x>"
Set the bit score cutoff for the per-sequence ranked hit list to
.I <x>,
where
.I <x>
is a real number.
The default is negative infinity; by default, the threshold
is controlled by E-value and not by bit score.
Hits with bit scores better than (greater than) this threshold
will be shown.
.TP
.BI -Z " <n>"
Calculate the E-value scores as if we had seen a sequence database of
.I <n>
sequences. The default is the number of sequences seen in your
database file
.I <seqfile>.
.SH EXPERT OPTIONS
.TP
.B --compat
Use the output format of HMMER 2.1.1, the 1998-2001 public
release; provided so 2.1.1 parsers don't have to be rewritten.
.TP
.BI --cpu " <n>"
Sets the maximum number of CPUs that the program
will run on. The default is to use all CPUs
in the machine. Overrides the HMMER_NCPU
environment variable. Only affects threaded
versions of HMMER (the default on most systems).
.TP
.B --cut_ga
Use Pfam GA (gathering threshold) score cutoffs.
Equivalent
to --globT <GA1> --domT <GA2>, but the GA1 and GA2 cutoffs
are read from the HMM file. hmm2build puts these cutoffs there
if the alignment file was annotated in a Pfam-friendly
alignment format (extended SELEX or Stockholm format) and
the optional GA annotation line was present. If these
cutoffs are not set in the HMM file,
.B --cut_ga
doesn't work.
.TP
.B --cut_tc
Use Pfam TC (trusted cutoff) score cutoffs. Equivalent
to --globT <TC1> --domT <TC2>, but the TC1 and TC2 cutoffs
are read from the HMM file. hmm2build puts these cutoffs there
if the alignment file was annotated in a Pfam-friendly
alignment format (extended SELEX or Stockholm format) and
the optional TC annotation line was present. If these
cutoffs are not set in the HMM file,
.B --cut_tc
doesn't work.
.TP
.B --cut_nc
Use Pfam NC (noise cutoff) score cutoffs. Equivalent
to --globT <NC1> --domT <NC2>, but the NC1 and NC2 cutoffs
are read from the HMM file. hmm2build puts these cutoffs there
if the alignment file was annotated in a Pfam-friendly
alignment format (extended SELEX or Stockholm format) and
the optional NC annotation line was present. If these
cutoffs are not set in the HMM file,
.B --cut_nc
doesn't work.
.TP
.BI --domE " <x>"
Set the E-value cutoff for the per-domain ranked hit list to
.I <x>,
where
.I <x>
is a positive real number.
The default is infinity; by default, all domains in the sequences
that passed the first threshold will be reported in the second list,
so that the number of domains reported in the per-sequence list is
consistent with the number that appear in the per-domain list.
.TP
.BI --domT " <x>"
Set the bit score cutoff for the per-domain ranked hit list to
.I <x>,
where
.I <x>
is a real number. The default is negative infinity;
by default, all domains in the sequences
that passed the first threshold will be reported in the second list,
so that the number of domains reported in the per-sequence list is
consistent with the number that appear in the per-domain list.
.I Important note:
only one domain in a sequence is absolutely controlled by this
parameter, or by
.B --domT.
The second and subsequent domains in a sequence have a de facto
bit score threshold of 0 because of the details of how HMMER
works. HMMER requires at least one pass through the main model
per sequence; to do more than one pass (more than one domain)
the multidomain alignment must have a better score than the
single domain alignment, and hence the extra domains must contribute
positive score. See the Users' Guide for more detail.
.TP
.BI --forward
Use the Forward algorithm instead of the Viterbi algorithm
to determine the per-sequence scores. Per-domain scores are
still determined by the Viterbi algorithm. Some have argued that
Forward is a more sensitive algorithm for detecting remote
sequence homologues; my experiments with HMMER have not
confirmed this, however.
.TP
.BI --informat " <s>"
Assert that the input
.I seqfile
is in format
.I <s>;
do not run Babelfish format autodection. This increases
the reliability of the program somewhat, because
the Babelfish can make mistakes; particularly
recommended for unattended, high-throughput runs
of HMMER. Valid format strings include FASTA,
GENBANK, EMBL, GCG, PIR, STOCKHOLM, SELEX, MSF,
CLUSTAL, and PHYLIP. See the User's Guide for a complete
list.
.TP
.B --null2
Turn off the post hoc second null model. By default, each alignment
is rescored by a postprocessing step that takes into account possible
biased composition in either the HMM or the target sequence.
This is almost essential in database searches, especially with
local alignment models. There is a very small chance that this
postprocessing might remove real matches, and
in these cases
.B --null2
may improve sensitivity at the expense of reducing
specificity by letting biased composition hits through.
.TP
.B --pvm
Run on a Parallel Virtual Machine (PVM). The PVM must
already be running. The client program
.B hmm2search-pvm
must be installed on all the PVM nodes.
Optional PVM support must have been compiled into
HMMER.
.TP
.B --xnu
Turn on XNU filtering of target protein sequences. Has no effect
on nucleic acid sequences. In trial experiments,
.B --xnu
appears to perform less well than the default
post hoc null2 model.
.SH SEE ALSO
Master man page, with full list of and guide to the individual man
pages: see
.B hmmer2(1).
.PP
For complete documentation, see the user guide (ftp://selab.janelia.org/pub/software/hmmer/2.3.2/Userguide.pdf); or see the HMMER web page,
http://hmmer.janelia.org/.
.SH COPYRIGHT
.nf
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine.
Freely distributed under the GNU General Public License (GPL).
.fi
See the file COPYING in your distribution for details on redistribution
conditions.
.SH AUTHOR
.nf
Sean Eddy
HHMI/Dept. of Genetics
Washington Univ. School of Medicine
4566 Scott Ave.
St Louis, MO 63110 USA
http://www.genetics.wustl.edu/eddy/
.fi
|