File: README

package info (click to toggle)
sigma-align 1.1.3-9
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 724 kB
sloc: sh: 3,421; ansic: 1,768; xml: 209; makefile: 18
file content (63 lines) | stat: -rw-r--r-- 3,279 bytes
parent folder | download | duplicates (6)
Sigma: Simple greedy multiple alignment
Version 1.1.3, (C) 2009, Rahul Siddharthan <rsidd@imsc.res.in>
Portions written and (C) 2009, Gayathri Jayaraman

This is a port to C of Sigma, which was originally (version 1.0)
written in ocaml.  A paper describing the algorithm was published in
BMC Bioinformatics 7:143 (2006).  A couple of important bugs have
been fixed, and it is about 3 to 4 times faster than the ocaml
version.  It does everything that the ocaml version did, but more
correctly (and faster).

1.1.1 is a minor update (see ChangeLog) with no new features.  1.1.2
gives a significant speed boost for larger input files but should
not otherwise behave differently.  1.1.3 contains a new local alignment 
algorithm that is linear in memory usage, though still quadratic in
time.  A kludge that had been put in 1.1 (pre-fragmentation of sequences
to avoid running out memory) has therefore been removed.

Changes since version 1.0:

1. Mismatches in previously-aligned sequence fragments are treated
   more intelligently.  Earlier, "N" was always used, which led to
   problems in aligning large numbers of sequences, where mismatches
   became more and more common. Now the majority base, if one exists,
   is used.  There is scope for improvement, which will be dealt with
   in a future version, but the current treatment seems adequate for
   most common situations.

2. The dynamic programming algorithm for finding local alignments
   tended to become slow for large input sequences: it is O(NM), in
   time and memory usage, for sequence lengths N and M.  The current
   version however needs only linear ie O(M+N) memory usage.  Versions
   1.1 through 1.1.2 included a workaround where sequences are
   pre-fragmented into pieces of average size smaller than L (4000 by
   default, changeable with the -l option, now removed).  There is
   also an important bugfix in this function.  (Optimisation and
   bugfix by Gayathri Jayaraman)

   As a result, this version of Sigma scales well to much larger
   datasets than earlier (eg, 10 seqs each 10000bp long), and performs
   much faster than many existing programs (including ClustalW and
   Dialign).  Sigma 1.0 was almost unusable on such datasets.

3. Some bugs relating to the consistency condition enforcement have
   been fixed.  These had the effect of occasionally disallowing
   legitimate alignments (but, I believe, did not cause incorrect 
   alignments).

For help on compiling, see the file COMPILING.  For command-line options, see
the file NOTES or simply run the program without options.  A unix manual page,
contributed by Charles Plessy, is in "sigma.1" and will be installed with "make
install".  The docbook source is in "sigma.1.xml".  For discussion of
background models and a sample file, see the Background directory.  For a
detailed description of the algorithm, see BMC Bioinformatics 7:143 (2006).

The program is distributed under the GNU General Public License, version 2.
For copyright and licensing information, see COPYING.

The program's website is 
            http://www.imsc.res.in/~rsidd/sigma/
and source code, as well as pre-compiled binaries for some
platforms, are available there.  To be informed of updates, email
the author, Rahul Siddharthan <rsidd@imsc.res.in>.