File: seqmatchall.html

package info (click to toggle)
emboss 5.0.0-7
links: PTS, VCS
area: main
in suites: lenny
size: 81,332 kB
ctags: 25,201
sloc: ansic: 229,873; java: 29,051; sh: 10,636; perl: 8,714; makefile: 1,227; csh: 520; asm: 351; pascal: 237; xml: 94; modula3: 8
file content (438 lines) | stat: -rw-r--r-- 10,161 bytes
<HTML>

<HEAD>
  <TITLE>
  EMBOSS: seqmatchall
  </TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" text="#000000">

<table align=center border=0 cellspacing=0 cellpadding=0>
<tr><td valign=top>
<A HREF="/" ONMOUSEOVER="self.status='Go to the EMBOSS home page';return true"><img border=0 src="emboss_icon.jpg" alt="" width=150 height=48></a>
</td>
<td align=left valign=middle>
<b><font size="+6">
seqmatchall
</font></b>
</td></tr>
</table>
<br>&nbsp;
<p>


<H2>
    Function
</H2>

All-against-all comparison of a set of sequences

<H2>
    Description
</H2>


This takes a set of sequences and does an all-against-all pairwise
comparison of words (fragments of the sequences of a specified fixed
size) in the sequences, finding regions of identity between any two
sequences. 

<p>
The larger the specified word size, the faster the comparison will
proceed.  Regions whose stretches of identity are shorter than the word
size will be missed.  You should therefore choose a word size that is
small enough to find those regions of similarity you are interested in
within a reasonable time-frame. 

<H2>
    Usage
</H2>
<b>Here is a sample session with seqmatchall</b>
<p>
Here is an example using an increased word size to avoid accidental matches: 
<p>

<p>
<table width="90%"><tr><td bgcolor="#CCFFFF"><pre>

% <b>seqmatchall </b>
All-against-all comparison of a set of sequences
Input sequence set: <b>@eclac.list</b>
Word size [4]: <b>15</b>
Output alignment [j01636.seqmatchall]: <b></b>

</pre></td></tr></table><p>
<p>
<a href="#input.1">Go to the input files for this example</a><br><a href="#output.1">Go to the output files for this example</a><p><p>



<H2>
    Command line arguments
</H2>
<table CELLSPACING=0 CELLPADDING=3 BGCOLOR="#f5f5ff" ><tr><td>
<pre>
   Standard (Mandatory) qualifiers:
  [-sequence]          seqset     Sequence set filename and optional format,
                                  or reference (input USA)
   -wordsize           integer    [4] Word size (Integer 2 or more)
  [-outfile]           align      [*.seqmatchall] Output alignment file name

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-outfile" associated qualifiers
   -aformat2           string     Alignment format
   -aextension2        string     File name extension
   -adirectory2        string     Output directory
   -aname2             string     Base file name
   -awidth2            integer    Alignment width
   -aaccshow2          boolean    Show accession number in the header
   -adesshow2          boolean    Show description in the header
   -ausashow2          boolean    Show the full USA in the alignment
   -aglobal2           boolean    Show the full sequence in alignment

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

</pre>
</td></tr></table>
<P>

<table border cellspacing=0 cellpadding=3 bgcolor="#ccccff">
<tr bgcolor="#FFFFCC">
<th align="left" colspan=2>Standard (Mandatory) qualifiers</th>
<th align="left">Allowed values</th>
<th align="left">Default</th>
</tr>

<tr>
<td>[-sequence]<br>(Parameter 1)</td>
<td>Sequence set filename and optional format, or reference (input USA)</td>
<td>Readable set of sequences</td>
<td><b>Required</b></td>
</tr>

<tr>
<td>-wordsize</td>
<td>Word size</td>
<td>Integer 2 or more</td>
<td>4</td>
</tr>

<tr>
<td>[-outfile]<br>(Parameter 2)</td>
<td>Output alignment file name</td>
<td>Alignment output file</td>
<td><i>&lt;*&gt;</i>.seqmatchall</td>
</tr>

<tr bgcolor="#FFFFCC">
<th align="left" colspan=2>Additional (Optional) qualifiers</th>
<th align="left">Allowed values</th>
<th align="left">Default</th>
</tr>

<tr>
<td colspan=4>(none)</td>
</tr>

<tr bgcolor="#FFFFCC">
<th align="left" colspan=2>Advanced (Unprompted) qualifiers</th>
<th align="left">Allowed values</th>
<th align="left">Default</th>
</tr>

<tr>
<td colspan=4>(none)</td>
</tr>

</table>


<H2>
    Input file format
</H2>


<b>seqmatchall</b> reads a set of sequence USAs.  

<p>

The sequences must be either all protein or all nucleic acid.

<p>


<a name="input.1"></a>
<h3>Input files for usage example </h3>
<p><h3>File: eclac.list</h3>
<table width="90%"><tr><td bgcolor="#FFCCFF">
<pre>
#Formerly ECLAC
tembl:J01636

#Formerly ECLACA
tembl:X51872

#Formerly ECLACI
tembl:V00294

#Formerly ECLACY
tembl:V00295

#Formerly ECLACZ
tembl:V00296
</pre>
</td></tr></table><p>

<H2>
    Output file format
</H2>



<a name="output.1"></a>
<h3>Output files for usage example </h3>
<p><h3>File: j01636.seqmatchall</h3>
<table width="90%"><tr><td bgcolor="#CCFFCC">
<pre>
########################################
# Program: seqmatchall
# Rundate: Sun 15 Jul 2007 12:00:00
# Commandline: seqmatchall
#    -sequence @../../data/eclac.list
#    -wordsize 15
# Align_format: match
# Report_file: j01636.seqmatchall
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: J01636
# 2: X51872
#=======================================

  1832 J01636          +     5646..7477     X51872          +        1..1832

#=======================================
#
# Aligned_sequences: 2
# 1: J01636
# 2: V00294
#=======================================

  1113 J01636          +       49..1161     V00294          +        1..1113

#=======================================
#
# Aligned_sequences: 2
# 1: J01636
# 2: V00295
#=======================================

  1500 J01636          +     4305..5804     V00295          +        1..1500

#=======================================
#
# Aligned_sequences: 2
# 1: J01636
# 2: V00296
#=======================================

  3078 J01636          +     1287..4364     V00296          +        1..3078

#=======================================
#
# Aligned_sequences: 2
# 1: X51872
# 2: V00295
#=======================================

   159 X51872          +        1..159      V00295          +     1342..1500

#=======================================
#
# Aligned_sequences: 2
# 1: V00295
# 2: V00296
#=======================================

    60 V00295          +        1..60       V00296          +     3019..3078

#---------------------------------------
#---------------------------------------
</pre>
</td></tr></table><p>


<p>

ECLAC (the complete E.coli lac operon) matches ECLACI ECLACZ ECLACY
and ECLACA (the individual genes), and there is a short overlap
between ECLACY and the flanking genes ECLACZ and ECLACA

<p>

The output is a list of regions of identity in pairs of sequences, each
consisting of one line with 7 columns of data separated by TABs or space
characters. 

<p>

The columns of data consist of:

<p>

<ul>
<li>The length of the region of identity.
<li>The start position in sequence 1.
<li>The end position in sequence 1.
<li>The name of sequence 1.
<li>The start position in sequence 2.
<li>The end position in sequence 2.
<li>The name of sequence 2.
</ul>

<H2>
    Data files
</H2>


None.

<H2>
    Notes
</H2>


The larger the word size, the faster the comparisons will proceed, but
regions of identitly smaller than the word size will not be reported.

<H2>
    References
</H2>


None.

<H2>
    Warnings
</H2>

None.


<H2>
    Diagnostic Error Messages
</H2>

None.

<H2>
    Exit status
</H2>


It exits with a status of 0.

<H2>
    Known bugs
</H2>


None.

<h2><a name="See also">See also</a></h2>
<table border cellpadding=4 bgcolor="#FFFFF0">
<tr><th>Program name</th><th>Description</th></tr>
<tr>
<td><a href="matcher.html">matcher</a></td>
<td>Finds the best local alignments between two sequences</td>
</tr>

<tr>
<td><a href="supermatcher.html">supermatcher</a></td>
<td>Match large sequences against one or more other sequences</td>
</tr>

<tr>
<td><a href="water.html">water</a></td>
<td>Smith-Waterman local alignment</td>
</tr>

<tr>
<td><a href="wordfinder.html">wordfinder</a></td>
<td>Match large sequences against one or more other sequences</td>
</tr>

<tr>
<td><a href="wordmatch.html">wordmatch</a></td>
<td>Finds all exact matches of a given size between 2 sequences</td>
</tr>

</table>

<P>
<a href="polydot.html">polydot</A> will give a graphical view of the
same matches.

<H2>
    Author(s)
</H2>
Ian Longden (il&nbsp;&copy;&nbsp;sanger.ac.uk)
<br>
Sanger Institute, Wellcome Trust Genome Campus, Hinxton,
Cambridge, CB10 1SA, UK.                      


<H2>
    History
</H2>

1999 - written - Ian Longden

<H2>
    Target users
</H2>
This program is intended to be used by everyone and everything, from naive users to embedded scripts.


<H2>
    Comments
</H2>
None

</BODY>
</HTML>