Sigma: Simple greedy multiple alignment
Version 1.1.1, (C) 2007, Rahul Siddharthan <email@example.com>
This is a port to C of Sigma, which was originally (version 1.0)
written in ocaml. A paper describing the algorithm was published in
BMC Bioinformatics 7:143 (2006). A couple of important bugs have
been fixed, and it is about 3 to 4 times faster than the ocaml
version. It does everything that the ocaml version did, but more
correctly (and faster).
1.1.1 is a minor update (see ChangeLog) with no new features.
Changes since version 1.0:
1. Mismatches in previously-aligned sequence fragments are treated
more intelligently. Earlier, "N" was always used, which led to
problems in aligning large numbers of sequences, where mismatches
became more and more common. Now the majority base, if one exists,
is used. There is scope for improvement, which will be dealt with
in a future version, but the current treatment seems adequate for
most common situations.
2. The dynamic programming algorithm for finding local alignments
tends to become slow for large input sequences: it is O(NM), in
time and memory usage, for sequence lengths N and M. The current
version includes a workaround where sequences are pre-fragmented
into pieces of average size smaller than L (4000 by default,
changeable with the new -l option). This seems to work well in
real life, both on synthetic sequence and on real DNA sequence. As
a result, this version of Sigma scales well to much larger datasets
than earlier (eg, 10 seqs each 10000bp long), and performs much
faster than many existing programs (including ClustalW and
Dialign). Sigma 1.0 was almost unusable on such datasets.
Nevertheless, this fragmentation is admittedly an inelegant hack;
in future versions, we will try to implement a more efficient
local-alignment algorithm that will avoid this "fragmenting" or, at
least, allow a larger average fragment size.
3. Some bugs relating to the consistency condition enforcement have
been fixed. These had the effect of occasionally disallowing
legitimate alignments (but, I believe, did not cause incorrect
For help on compiling, see the file COMPILING. For command-line options, see
the file NOTES or simply run the program without options. A unix manual page,
contributed by Charles Plessy, is in "sigma.1" and will be installed with "make
install". The docbook source is in "sigma.1.xml". For discussion of
background models and a sample file, see the Background directory. For a
detailed description of the algorithm, see BMC Bioinformatics 7:143 (2006).
The program is distributed under the GNU General Public License, version 2.
For copyright and licensing information, see COPYING.
The program's website is
and source code, as well as pre-compiled binaries for some
platforms, are available there. To be informed of updates, email
the author, Rahul Siddharthan <firstname.lastname@example.org>.