1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
|
Sigma: Simple greedy multiple alignment
Version 1.1.3, (C) 2009, Rahul Siddharthan <rsidd@imsc.res.in>
Portions written and (C) 2009, Gayathri Jayaraman
This is a port to C of Sigma, which was originally (version 1.0)
written in ocaml. A paper describing the algorithm was published in
BMC Bioinformatics 7:143 (2006). A couple of important bugs have
been fixed, and it is about 3 to 4 times faster than the ocaml
version. It does everything that the ocaml version did, but more
correctly (and faster).
1.1.1 is a minor update (see ChangeLog) with no new features. 1.1.2
gives a significant speed boost for larger input files but should
not otherwise behave differently. 1.1.3 contains a new local alignment
algorithm that is linear in memory usage, though still quadratic in
time. A kludge that had been put in 1.1 (pre-fragmentation of sequences
to avoid running out memory) has therefore been removed.
Changes since version 1.0:
1. Mismatches in previously-aligned sequence fragments are treated
more intelligently. Earlier, "N" was always used, which led to
problems in aligning large numbers of sequences, where mismatches
became more and more common. Now the majority base, if one exists,
is used. There is scope for improvement, which will be dealt with
in a future version, but the current treatment seems adequate for
most common situations.
2. The dynamic programming algorithm for finding local alignments
tended to become slow for large input sequences: it is O(NM), in
time and memory usage, for sequence lengths N and M. The current
version however needs only linear ie O(M+N) memory usage. Versions
1.1 through 1.1.2 included a workaround where sequences are
pre-fragmented into pieces of average size smaller than L (4000 by
default, changeable with the -l option, now removed). There is
also an important bugfix in this function. (Optimisation and
bugfix by Gayathri Jayaraman)
As a result, this version of Sigma scales well to much larger
datasets than earlier (eg, 10 seqs each 10000bp long), and performs
much faster than many existing programs (including ClustalW and
Dialign). Sigma 1.0 was almost unusable on such datasets.
3. Some bugs relating to the consistency condition enforcement have
been fixed. These had the effect of occasionally disallowing
legitimate alignments (but, I believe, did not cause incorrect
alignments).
For help on compiling, see the file COMPILING. For command-line options, see
the file NOTES or simply run the program without options. A unix manual page,
contributed by Charles Plessy, is in "sigma.1" and will be installed with "make
install". The docbook source is in "sigma.1.xml". For discussion of
background models and a sample file, see the Background directory. For a
detailed description of the algorithm, see BMC Bioinformatics 7:143 (2006).
The program is distributed under the GNU General Public License, version 2.
For copyright and licensing information, see COPYING.
The program's website is
http://www.imsc.res.in/~rsidd/sigma/
and source code, as well as pre-compiled binaries for some
platforms, are available there. To be informed of updates, email
the author, Rahul Siddharthan <rsidd@imsc.res.in>.
|