
|
Beta Release 0.7.1 (26 September, 2008)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Version 0.7.0 does not work with reads >63bp at all. I overlooked two
lines of codes which assume reads are 63bp or shorter. Now I have fixed
the bug and tested it on simulated long reads. It seems to work fine. I
am sorry for this obvious bug. No other things are changed since 0.7.0.
(0.7.1: 26 September 2008, r672)
Beta Release 0.7.0 (21 September, 2008)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since this release, MAQ can accept reads no longer than 127bp, instead
of 63bp in the previous version. This is achieved at the cost of 18%
slower speed and 16% more peak memory usage. The .map alignment format
is also changed accordingly, which means the new format is NOT
compatible with the old format. I will shortly put a converter on the
MAQ website so that you can convert your alignment done by 0.6.x to the
new format without redoing the alignment. In addition, you can choose to
revert to the 63bp version by configure with "--enable-shortreads", but
this is not recommende.
Furthermore, the NovoCraft developers have implemented a converter that
converts the NovoCraft alignment format to MAQ's .map format. I
incorporated their codes into MAQ. NovoCraft can find short indels with
single-ended reads and have most of major features of MAQ. It is also
fast and well developed. NovoCraft is a good alternative to MAQ.
In additional to the major changes, here are the other notable changes,
only a few:
* Improved progress report in maq map. Someone is using MAQ to align
reads to millions of contigs. The resulting stderr output is even
larger than the alignment itself. Now contig names will not be
printed.
* Fixed a segfault in Smith-Waterman alignment. I have not pinpointed
the line that causes the segfault, but I guess this is caused by a
rare out-of-boundary event. Anyway, the segfault seems to go away
after enforcing the boundary check.
Probably MAQ will never go to 0.8.0. Although it is cheap to make MAQ
align reads up to 255bp, I am not going to do that. When reads go longer
and longer, MAQ's power will be reduced due to its inability to find
short indels on single ended reads. I am still experimenting novel
algorithms for long reads, and BWA, which has been made public, is the
unfinished product. Although BWA has not been fully developed into a
comprehensive package like MAQ, it shows the potential to do ultra-fast
gapped alignment on long reads. The current BWA does global alignment
(w.r.t reads) for a few hundreds bp of reads by taking first few tens of
bp as seed. You can find more information on the MAQ website.
(0.7.0: 21 September 2008, r669)
Beta Release 0.6.8 (27 July, 2008)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Most of new features in this release are mainly designed for the 1000
genome project. For other users, the most obvious change is a bug fix in
the assemble command. Fixing this bug reduces error dependency
coefficient from 0.93 to 0.85. The SNP accuracy remains similar to the
previous version.
Other notable changes include:
* Formally changed the license to GPL version 3.
* Added mapvalidate command, which checks whether an alignment file is
corrupted. The mapmerge command also does some sanity check when
merging alignments.
* Support generating GLF format (for the 1000 Genome Project). Codes
for manipulating GLF files are available in SVN now.
* The mapcheck command can optionally dump additional information for
quality recalibration (for the 1000 Genome Project).
* Fixed a potential bug in indelpe.cc (thank Vaughn for reporting the
bug).
* Fixed a potential compiling error in assopt.c (thank Jason for the
bug report).
* Only dump unmapped reads with `maq map -u'.
* Added more online documentation about how to call SNPs for SOLiD data
and to call SNPs from pooled samples.
(0.6.8: 27 July 2008, r651)
Beta Release 0.6.7 (23 June, 2008)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As promised, MAQ now works with reads from Illumina's long insert-size
libraries and with reads from samples by pooling multiple
individuals/strains together. Also in this release, unmapped reads with
their mates mapped are also stored in the .map alignment file. This
strategy helps users who do local assembly to find structural
variations.
Other notable changes are:
* In indelpe, fixed a bug about the position of an insertion.
Previously, the output position is 1bp-away from the true position.
* In indelpe, output addition information about indels, which also
helps to tell whether the indel is homozygous or heterozygous.
* In SNPfilter, integrated the consensus quality filter. Previously,
this has to be done with an awk command.
* In SNPfilter and easyrun, improved the consensus quality filter. This
is particularly important for SNP calling on pooled samples.
* Added command to detect correputed .map files.
* In mapcheck, optionally output additional information for mapping
based quality calibration.
(0.6.7: 23 June 2008, r631)
Beta Release 0.6.6 (27 April, 2008)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Two new features are added to this release. First, cns2fq now gives
regions where maq believes SNPs can be called confidentially. Second,
maq can optionally dump all perfect or 1-mismatch hits to a separate
file. Maq cannot make use of information of multiple hits, but I can see
outputing these hits may help people who do expressional profiling.
No bugs are fixed in this release and therefore people do not need to
update unless using the new features.
In the next release, maq will support read alignment for Illumina's long
insert-size library which has different read orientation for a read
pair. I will also try to implement a SNP caller for pooled sample.
(0.6.6: 27 April 2008, r602)
Beta Release 0.6.5 (28 March, 2008)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is mainly a bug fix release.
* Fixed bug: read names longer than 35 characters will not be stored in
the alignment file. Command rmdup for PE data will also be affected
when reads have no names. Short read names will not cause any
problems. This bug is specific to 0.6.4.
* Fixed bug: reference shorter than 3bp will cause malformed consensus
file. This bug is specific to 0.6.4.
* Fixed bug: potential memory violation in indelsoa. This rare bug
affects all 0.6.x series.
* Fixed bug: potential memory violation in simulate when the reference
is short. This bug is rare if reference sequences are all long. All
0.6.x will be affected.
In addition to the bug fixes, I also finished SOLiD support in this
release. A new script solid2fastq.pl is introduced to convert SOLiD read
format to FASTQ format accepted by maq. Furthermore, maq is able to
convert color alignment to nucleotide alignment with inferred nucleotide
base qualities. Nucleotide consensus and SNPs can be generated with
assemble in the standard way. There are still some room left for further
improvement. I will work on it in future.
Another change is in the new release, the insert size of a read pair is
measured between the 1st cycle of both reads in the pair, no matter
whether they are mapped as a proper pair or not. Defining insert size in
this way may be more conceptually consistent. Note that this change will
not affect properly aligned Solexa reads at all, but will slightly
affect SOLiD PE reads.
I am really sorry for these bugs in maq and hope the new version is more
stable.
(0.6.5: 28 March 2008, r578)
Beta Release 0.6.4 (15 March, 2008)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maq is now 90% faster on human alignment. This acceleraion is partly
achieved by extensive code optimization and partly by using 28bp seed,
instead of 24bp in previous versions, in the alignment. As both the
Sanger Inst. and Illumina company have greatly improved the data
quality, using 28bp seed does not affect the final SNP accuracy. If
users still intend to use 24bp seed like before, they should compile maq
with "./configure --enable-slowmap". The online binaries are compilied
with 28bp seed.
It is important to note that Illumina/Solexa sequencing may produce many
false polyA at the edges of a tile. These polyA artefacts may greatly
increase the running time of maq. Users are advised to remove these
artefacts with their own scripts before alignment. For the moment maq
does not provide a general functionality for filtering polyA.
From this version, the names of a pair of reads can be different at the
tailing "/1" or "/2". For example, "read0001/1" and "read0001/2" are
allowed for a read pair. In this way, the two reads in a pair can be
discriminated in the mapview output.
Another important change is the consensus calling model. I noticed a
theoretical flaw in the statistical model behind. The sequencing errors
seem to be more independent when I fixed the flaw and therefore the
error dependency coefficient is increased to 0.93 by default. The final
SNP accuracy is about the same as the previous version.
I also improved the SOLiD support in this version. A script is provided
for converting SOLiD colour reads to the maq fastq. Mate-pair SOLiD
reads can also be correctly aligned. For a SOLiD read pair, the correct
orientation should be F3_reverse-R3_reverse or R3_forward-F3_forward. I
did not know this before.
Other notable changes include:
* Assemble now calculates minimum neighbouring quality in the 7bp
window surrounding the current position. SNPfilter will filter
unreliable SNPs based on the information. This idea is inspired by
NQS (Neighbourhood Quality Standard).
* Optionally store mismatching positions in .map file. The trade-off is
the maximum read length is 55bp when this option is switched on.
* Fixed an ever existing bug in PE alignment. Now about 1% more
properly aligned pairs can be found.
* Added paf_utils.pl script. This script parses soap, eland, rmap and
maq alignment formats to the same format. It also presents an example
about how to read/write maq's binary .map format with Perl.
* Added support for converting Bustard output (_prb.txt and _seq.txt)
in fq_all2std.pl. However, users should avoid using Bustard output at
best. Gerald output is always better.
* "Alternative mapping quality" in mapview is now the lower SE mapping
quality of the two ends. Previously this does not stand for properly
paired reads.
* Filter polyA in reads, only for data generated by the Sanger
Institute. Note that maq can be several times SLOWER if there are a
lot of polyA artefacts in the reads.
* Pileup can optionally output base position on the read.
* Maq now trims long adapter contamination before alignment and trims
short adapter contamination after the alignment.
* Submap now works as a filter on .map file. Users should always
extract the reads in a region with maqindex in the maqview package.
* Updated mapstat command.
* Fixed a bug in easyrun about relative paths.
* Fixed a bug in fasta2csfasta.
* Fixed a rare bug in calculating the distance of a pair.
* Fixed a rare bug in determining the boundary of Smith-Waterman
alignment.
* Added asub, a generic script for submitting array jobs on LSF/SGE.
Added maq_sanger.pl, a script for running maq at the Sanger Inst.
I will work on variants calling for multiple samples and further improve
SOLiD support in future versions. To find the latest development of maq,
please check out:
svn co https://maq.svn.sourceforge.net/svnroot/maq/branches/lh3/maq
(0.6.4: 15 March 2008, r537)
Beta Release 0.6.3 (3 January, 2008)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Again, most changes happened in auxilliary commands. Simulation and Perl
scripts were improved a lot.
Changes and bug fixes include:
* Added diploid simulation mode. Given a haploid reference sequence,
maq can generate a diploid sequence and add variants to both
haploids.
* In 'easyrun', automatically input split FASTQ. Users do not need to
split the input by themselves.
* Paired end reads can be used with 'easyrun'.
* Added 'snpreg' command, which roughly calculates the size of regions
where SNPs can be called.
* Added 'simucns' command, which evaluates the accuracy of consensus
mapping qualities from simulated read alignment.
* Addd 'demo' command to maq.pl. It demonstrates how to simulate reads,
to use easyrun and to evaluate the result with maq_eval.pl.
* In 'maq map', set flag 18 for a read whose mate mapped with the
Smith-Waterman algorithm as paired.
Maq has several companion scripts and consists of many commands, but not
all of them are well documented. I will gradually improve the
documentations, especially those useful to endusers. In addition, not
all maq functionalities are fully optimized. Advanced users may want to
implement in their own ways. I would also like to improve Maq if you
have better ideas.
(0.6.3: 3 January 2008, r466)
Beta Release 0.6.2 (23 November, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Most changes in this release happened in auxilliary commands instead of
key commands. Except for random factors, the 0.6.2 alignment and
consensus should be almost identical to those resultant from 0.6.1.
Other changes and bug fixes include:
* Added an option to dump unmapped reads to a separate file. Users can
study why these reads cannot be aligned.
* Implemented `export2maq' commands which converts Illumina's in-house
Export format to maq's ".maq" binary format. Genotype calling is
supported because the Export format contains mapping qualities of
reads.
* Implemented `eland2maq' command which convert alignments in an Eland
output to Maq's ".map" format. Genotype calling is not defected due
to the lack of qualities.
* Made `indelsoa' command available to end users. This command
implements a state-of-art homozygous break point detector for
single-end reads. However, this command mainly aims to faciliate SNP
filtering around break points instead of finding all the indels. The
`indelpe' command always works better.
* Made most of commands recognize `-' as the standard input or standard
output. This may help stream-based pipeline.
* Restored the `-m' option in `pileup' and `assemble' commands. Some
users regard this to be useful.
* Added fq_all2std.pl, a script to convert various read formats to the
standard/Sanger FASTQ format.
* Improved the rules in filtering SNPs and allowed to filter out SNPs
beside potential indel sites.
* Fixed a bug again in bisulfite alignment mode. This is to meet a
user's request. I have not tried it on real data.
* Added functionalities to evaluate indels in maq_eval.pl.
* Improved `fastq2bfq' command in both maq and maq.pl to make them
easier to use.
* Fixed a weird compiling error for some powperpc64-linux machines.
(0.6.2: 23 November 2007, r428)
Beta Release 0.6.1 (3 October, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is mainly a bugfix release. All of the bugs are minor or happen
rarely. End users may not observe obvious changes in their results. You
do not really need to re-align the reads with this version unless you
feel uncomfortable with any trivial bugs.
The changes and bug fixes include:
* In this release, a read is mapped to the position where the sum of
quality values of mismatched bases is approximately minimum.
* Zero quality will be changed to one in `fastq2bfq'. This is because
zero-quality bases will be regarded as `N' in alignment.
* Fixed a bug in adapter trimming. The preious version does not work
properly.
* Fixed a very rare bug in `assemble' and `pileup'. It may lead to
false zero depth at some sites (about 1 in 1,000,000 sites).
* Fixed a bug for bisulfite alignment mode. This mode has not been
thoroughly tested, though.
(0.6.1: 3 October 2007, r333)
Beta Release 0.6.0 (5 September, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is a release with several bleeding-edge modifications which
jeopardized the stability of maq. Thorough testing has been done to make
sure maq works properly and that is why this release is delayed.
In this new release, maq allows more mismatches after the first 24bp of
a read. Trimming low-quality 3'-end of reads is usually not necessary
any more. In particular, maq does not recommend to trim reads
recursively because this will affect the accuracy of mapping qualities.
Furthermore, since this release, maq is able to find short indels with
paired end (PE) reads. As Illumina's PE protocol will become standard in
the near future, this indel detector will play its role.
Other notable changes include:
* Changed ".map" binary format. The number of mismatches of the second
best hit is replaced by the sum of errors of the best hit. The
distance between the pair now equals to the outer distance. Reads
with indels are also stored in the ".map" files. The mapview output
is changed accordingly.
* Added the number of 0- and 1-mismatch hits to the mapview output.
* Rewrote rmdup command. This command now keeps all abnormal pairs as
well as reads with indels.
* Made simulate command generate reads on both strands. Previously
read1 always come from the forward strand and read2 from the reverse
strand.
* Allowed to change the average MAF for heterozygous sites. This may
help for pooled sample, but it has not been evaluated.
* Improved 3'-adapter trimming. Fully contaiminated reads can be
detected now.
In this release, I am trying to stablize the alignment part and hope the
alignment file generated by maq-0.6.0 can be compatible with later
releases. Furthermore, maq is moving towards the first formal release
1.0.0. In the near future, I will also try to stablize the entire code
instead of testing new features frequently.
(0.6.0: 5 September 2007, r249)
Beta Release 0.5.1 (31 July, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All the notable changes include:
* Bugfix in `map': fixed a bug which will lead to wrong alignment when
two similar regions having identical coordinates but on different
references. This bug can be dated back to the first piece of codes of
maq.
* Bugfix in `map' paired end alignment: fixed a bug which will lead to
wrong alignment when there are two good hits in a small region.
* New feature `simulate': a sophisticated paired end read simulator has
been implemented. It builds an order-one Markov chain and trains
parameters from real read data. The simulator is able to generate
reads with quality distribution quite similar to real ones. In
addition, three parameter sets will be provided with Maq. Endusers
can simulate realistic reads even without any real data.
* New feature `rmdup': remove read pairs with identical outer
coordinates. Doing this may improve the SNP accuracy in practice.
* Since this release, the main documentation will be maintained in a
man page. PDF version will also come with Maq distributions.
(0.5.1: 31 July 2007, r213)
Beta Release 0.5.0 (13 July, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From this release, this program is formally renamed as `maq', which
stands for Mapping and Assembly with Qualities. The version number still
follows the previous series.
In addition to the name of the program, another major change in this
release is the format of the maq alignment file. In response to the
request of several users, read names will be stored in the alignment
file. The mapview output is also revised accordingly.
Other notable changes and bug fixes in this release include:
* Bugfix in `maq.pl': follow symbolic links.
* New feature `maq_plot.pl': plot read depth and abnormal read pairs
along the reference.
* New feature in `maq.pl': `SNPfilter' command to rule out unreliable
SNPs.
* New feature in `maq.pl': more analyses added to `easyrun'.
(0.5.0: 13 July 2007, r171)
Beta Release 0.4.3 (4 July, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In this new release, several bugs are fixed and a number of minor
features are implemented.
* Bugfix in 'mapass2.pl easyrun': fixed bugs when multiple read files
are provided on the command line.
* New feature: allow to use single-end mapping quality in several
commands. By default, mapass2 will use paired end mapping qualities
if reads are paired. However, I found this quality is sometimes
overestimated. It is good to check what the difference between the
results of the two type of mapping qualities.
* Bugfix in 'mapcheck': fixed an integer overflow and skipped 'N'
regions in calculating the average depth.
* Bugfix in paired end alignment: fixed wrong coordinates for 0.03% or
paired reads. Single end alignment will not be affected.
* New feature in 'match': when '-N' is flagged, more alignment
information will be dumped to the stderr.
(0.4.3: 4 July 2007, r148)
Beta Release 0.4.2 (21 June, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Several people were asking me to output the name of each mapped
read. Now here it is. You can tune this feature on by:
mapass2 match -N out.map ref.bfa reads.bfq 2>out.log
The read name, reference seqname, position, strand, paired mapping
quality, single mapping quality and mismatched bases will be printed on
stderr. This option `-N' should only be used in debugging. It will cost
more memory and diskspace as well.
That is all. People who do not need this feature can stick to 0.4.1.
(0.4.2: 21 June 2007, r130)
Beta Release 0.4.1 (17 June, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is mainly a bug fix release. All users are recommended to
update. New features and bug fixes include:
* New command `sol2sanger': convert Solexa FASTQ to Sanger/standard
FASTQ format. The difference between the two formats is how the
qualities are scaled.
* New command `bfq2fastq': convert mapass' binary FASTQ format to
standard FASTQ.
* Bugfix in `cns2win': fixed wrong report when chr is not specified on
the command line.
* Bugfix in `mapcheck': complemented bases on the reverse strand.
(0.4.1: 17 June 2007, r124)
Beta Release 0.4.0 (15 May, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One week after the last release, 0.4.0 comes out. Several improvements
make this release different from the previous ones:
* New consensus base calling model. Although updating model frequently
happens in the development of mapass2, this one is different. It is
the first model that can satisfy me. All previous models make me feel
uncomfortable in a way or another. However, good theory does not
always mean better performance. The new model only improve the
accuracy by less than 1 percent.
* Preliminary functions to process SOLiD data. This is the first the
release that is able to process AB SOLiD data. The current strategy
cannot fully make use of all the colour information that is unique to
SOLiD data, but it is good enough to study the strength and weakness
of SOLiD data. I may improve these functions when SOLiD becomes more
stable. It is being improved.
* Use of the GNU build systems. Mapass2 is better compiled with 64bit
support. The GNU build systems make this easier. Apple universal
binaries can be compiled, too. Although I still quite like to write
Makefile by myself, I think to use a more sophisticated method is the
right way to go. This is the first time I have tried this.
* Considerable codes clean up and minor improvements in assembling
related parts.
Beginning with this release, I will probably not release mapass2 so
frequently as what happened in the past three weeks. Although I am not
entirely satisfied with the accuracy of current performance, I am happy
with the whole theory behind and the practical usefulness of mapass2. I
am sure mapass2 is one of the best softwares to map and assemble short
reads.
(0.4.0: 15 May 2007, r114)
Beta Release 0.3.1 (9 May, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is a release with minor revisions. The consensus calling model is
modified slightly which, unfortunately, decreases the accuracy by about
1%. However, I will stick to it anyway because it is more concise and
correct in theory. Actually, there is a bug when I was implementing the
old model.
Other changes include:
* Add `subpos' command: extract a required subset from .cns file
* Improve `pileup' output by making it more informative and allowing to
extract a required subset.
* Fix two bugs: one in k-small and the other in `mapcheck'. These are
trivial, though.
(0.3.1: 9 May 2007, r94)
Beta Release 0.3.0 (3 May, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A new release comes out. Consensus calling model has been improved a
little. Read-pair quality can be calculated more precisely. More
commands are added to facilitate subsequent analyses. Detailed
improvements and fixes are:
* Alternative model for consensus base calling. The new model does not
outperform the old one, but it tend to be more correct in theory and
more flexible.
* Improved model for read-pair quality. This fixes possible
overestimation when a pair can be mapped to several places with
correct orientation and distance. Note that the default parameter
should be adjusted in some cases. For chrX data, I suggest to apply
"-t 0.8".
* Reference based consesus calling (RBCC). Call the consensus based on
dbSNP information. Adjust the prior at the dbSNP sites. This is kind
of cheating, but it does help to improve the final SNP calls.
* Informative mapcheck. Command 'mapcheck' now outputs more
information. It is also integrated to 'mapass2.pl'.
* Fastq2bfq in batch mode. This function converts or organizes all the
fastq files in a directory. 'farm-run.pl' script can be easily
applied to the resultant directory structure.
* A few bug fixes.
(0.3.0: 3 May 2007, r86)
Beta Release 0.2.1 (23 April, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is mainly a bug fix release. The previous version cannot give
correct results when there are two or more reference sequences. I am
sorry for this obvious bug. Further improvements include:
* Add `mapcheck' command. This command counts observed substitutions on
reads with respect to the reference. It helps to check the systematic
bias conatined in reads.
* Consensus can be assembled from one sequence. Previously, all the
consensus must be assembled together.
(0.2.1: 23 April 2007, r67)
Beta Release 0.2.0 (22 April, 2007)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is the first release of mapass2, a program that maps
Illumina/Solexa reads to the reference and calls the consensus. This
program has been tested on real large-scale data. It is one of the few
softwares that is able to handle these huge amount of data efficiently
and accurately.
(0.2.0: 22 April 2007, r60)
|