File: NEWS.md

package info (click to toggle)
minimap2 2.17%2Bdfsg-12
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 1,204 kB
  • sloc: ansic: 8,653; javascript: 2,301; makefile: 130; python: 91; sh: 42; perl: 29
file content (616 lines) | stat: -rw-r--r-- 21,227 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
Release 2.17-r941 (4 May 2019)
------------------------------

Changes since the last release:

 * Fixed flawed CIGARs like `5I6D7I` (#392).

 * Bugfix: TLEN should be 0 when either end is unmapped (#373 and #365).

 * Bugfix: mappy is unable to write index (#372).

 * Added option `--junc-bed` to load known gene annotations in the BED12
   format. Minimap2 prefers annotated junctions over novel junctions (#197 and
   #348). GTF can be converted to BED12 with `paftools.js gff2bed`.

 * Added option `--sam-hit-only` to suppress unmapped hits in SAM (#377).

 * Added preset `splice:hq` for high-quality CCS or mRNA sequences. It applies
   better scoring and improves the sensitivity to small exons. This preset may
   introduce false small introns, but the overall accuracy should be higher.

This version produces nearly identical alignments to v2.16, except for CIGARs
affected by the bug mentioned above.

(2.17: 5 May 2019, r941)



Release 2.16-r922 (28 February 2019)
------------------------------------

This release is 50% faster for mapping ultra-long nanopore reads at comparable
accuracy. For short-read mapping, long-read overlapping and ordinary long-read
mapping, the performance and accuracy remain similar. This speedup is achieved
with a new heuristic to limit the number of chaining iterations (#324). Users
can disable the heuristic by increasing a new option `--max-chain-iter` to a
huge number.

Other changes to minimap2:

 * Implemented option `--paf-no-hit` to output unmapped query sequences in PAF.
   The strand and reference name columns are both `*` at an unmapped line. The
   hidden option is available in earlier minimap2 but had a different 2-column
   output format instead of PAF.

 * Fixed a bug that leads to wrongly calculated `de` tags when ambiguous bases
   are involved (#309). This bug only affects v2.15.

 * Fixed a bug when parsing command-line option `--splice` (#344). This bug was
   introduced in v2.13.

 * Fixed two division-by-zero cases (#326). They don't affect final alignments
   because the results of the divisions are not used in both case.

 * Added an option `-o` to output alignments to a specified file. It is still
   recommended to use UNIX pipes for on-the-fly conversion or compression.

 * Output a new `rl` tag to give the length of query regions harboring
   repetitive seeds.

Changes to paftool.js:

 * Added a new option to convert the MD tag to the long form of the cs tag.

Changes to mappy:

 * Added the `mappy.Aligner.seq_names` method to return sequence names (#312).

For NA12878 ultra-long reads, this release changes the alignments of <0.1% of
reads in comparison to v2.15. All these reads have highly fragmented alignments
and are likely to be problematic anyway. For shorter or well aligned reads,
this release should produce mostly identical alignments to v2.15.

(2.16: 28 February 2019, r922)



Release 2.15-r905 (10 January 2019)
-----------------------------------

Changes to minimap2:

 * Fixed a rare segmentation fault when option -H is in use (#307). This may
   happen when there are very long homopolymers towards the 5'-end of a read.

 * Fixed wrong CIGARs when option --eqx is used (#266).

 * Fixed a typo in the base encoding table (#264). This should have no
   practical effect.

 * Fixed a typo in the example code (#265).

 * Improved the C++ compatibility by removing "register" (#261). However,
   minimap2 still can't be compiled in the pedantic C++ mode (#306).

 * Output a new "de" tag for gap-compressed sequence divergence.

Changes to paftools.js:

 * Added "asmgene" to evaluate the completeness of an assembly by measuring the
   uniquely mapped single-copy genes. This command learns the idea of BUSCO.

 * Added "vcfpair" to call a phased VCF from phased whole-genome assemblies. An
   earlier version of this script is used to produce the ground truth for the
   syndip benchmark [PMID:30013044].

This release produces identical alignment coordinates and CIGARs in comparison
to v2.14. Users are advised to upgrade due to the several bug fixes.

(2.15: 10 Janurary 2019, r905)



Release 2.14-r883 (5 November 2018)
-----------------------------------

Notable changes:

 * Fixed two minor bugs caused by typos (#254 and #266).

 * Fixed a bug that made minimap2 abort when --eqx was used together with --MD
   or --cs (#257).

 * Added --cap-sw-mem to cap the size of DP matrices (#259). Base alignment may
   take a lot of memory in the splicing mode. This may lead to issues when we
   run minimap2 on a cluster with a hard memory limit. The new option avoids
   unlimited memory usage at the cost of missing a few long introns.

 * Conforming to C99 and C11 when possible (#261).

 * Warn about malformatted FASTA or FASTQ (#252 and #255).

This release occasionally produces base alignments different from v2.13. The
overall alignment accuracy remain similar.

(2.14: 5 November 2018, r883)



Release 2.13-r850 (11 October 2018)
-----------------------------------

Changes to minimap2:

 * Fixed wrongly formatted SAM when -L is in use (#231 and #233).

 * Fixed an integer overflow in rare cases.

 * Added --hard-mask-level to fine control split alignments (#244).

 * Made --MD work with spliced alignment (#139).

 * Replaced musl's getopt with ketopt for portability.

 * Log peak memory usage on exit.

This release should produce alignments identical to v2.12 and v2.11.

(2.13: 11 October 2018, r850)



Release 2.12-r827 (6 August 2018)
---------------------------------

Changes to minimap2:

 * Added option --split-prefix to write proper alignments (correct mapping
   quality and clustered query sequences) given a multi-part index (#141 and
   #189; mostly by @hasindu2008).

 * Fixed a memory leak when option -y is in use.

Changes to mappy:

 * Support the MD/cs tag (#183 and #203).

 * Allow mappy to index a single sequence, to add extra flags and to change the
   scoring system.

Minimap2 should produce alignments identical to v2.11.

(2.12: 6 August 2018, r827)



Release 2.11-r797 (20 June 2018)
--------------------------------

Changes to minimap2:

 * Improved alignment accuracy in low-complexity regions for SV calling. Thank
   @armintoepfer for multiple offline examples.

 * Added option --eqx to encode sequence match/mismatch with the =/X CIGAR
   operators (#156, #157 and #175).

 * When compiled with VC++, minimap2 generated wrong alignments due to a
   comparison between a signed integer and an unsigned integer (#184). Also
   fixed warnings reported by "clang -Wextra".

 * Fixed incorrect anchor filtering due to a missing 64- to 32-bit cast.

 * Fixed incorrect mapping quality for inversions (#148).

 * Fixed incorrect alignment involving ambiguous bases (#155).

 * Fixed incorrect presets: option `-r 2000` is intended to be used with
   ava-ont, not ava-pb. The bug was introduced in 2.10.

 * Fixed a bug when --for-only/--rev-only is used together with --sr or
   --heap-sort=yes (#166).

 * Fixed option -Y that was not working in the previous releases.

 * Added option --lj-min-ratio to fine control the alignment of long gaps
   found by the "long-join" heuristic (#128).

 * Exposed `mm_idx_is_idx`, `mm_idx_load` and `mm_idx_dump` C APIs (#177).
   Also fixed a bug when indexing without reference names (this feature is not
   exposed to the command line).

Changes to mappy:

 * Added `__version__` (#165).

 * Exposed the maximum fragment length parameter to mappy (#174).

Changes to paftools:

 * Don't crash when there is no "cg" tag (#153).

 * Fixed wrong coverage report by "paftools.js call" (#145).

This version may produce slightly different base-level alignment. The overall
alignment statistics should remain similar.

(2.11: 20 June 2018, r797)



Release 2.10-r761 (27 March 2018)
---------------------------------

Changes to minimap2:

 * Optionally output the MD tag for compatibility with existing tools (#63,
   #118 and #137).

 * Use SSE compiler flags more precisely to prevent compiling errors on certain
   machines (#127).

 * Added option --min-occ-floor to set a minimum occurrence threshold. Presets
   intended for assembly-to-reference alignment set this option to 100. This
   option alleviates issues with regions having high copy numbers (#107).

 * Exit with non-zero code on file writing errors (e.g. disk full; #103 and
   #132).

 * Added option -y to copy FASTA/FASTQ comments in query sequences to the
   output (#136).

 * Added the asm20 preset for alignments between genomes at 5-10% sequence
   divergence.

 * Changed the band-width in the ava-ont preset from 500 to 2000. Oxford
   Nanopore reads may contain long deletion sequencing errors that break
   chaining.

Changes to mappy, the Python binding:

 * Fixed a typo in Align.seq() (#126).

Changes to paftools.js, the companion script:

 * Command sam2paf now converts the MD tag to cs.

 * Support VCF output for assembly-to-reference variant calling (#109).

This version should produce identical alignment for read overlapping, RNA-seq
read mapping, and genomic read mapping. We have also added a cook book to show
the variety uses of minimap2 on real datasets. Please see cookbook.md in the
minimap2 source code directory.

(2.10: 27 March 2017, r761)



Release 2.9-r720 (23 February 2018)
-----------------------------------

This release fixed multiple minor bugs.

* Fixed two bugs that lead to incorrect inversion alignment. Also improved the
  sensitivity to small inversions by using double Z-drop cutoff (#112).

* Fixed an issue that may cause the end of a query sequence unmapped (#104).

* Added a mappy API to retrieve sequences from the index (#126) and to reverse
  complement DNA sequences. Fixed a bug where the `best_n` parameter did not
  work (#117).

* Avoided segmentation fault given incorrect FASTQ input (#111).

* Combined all auxiliary javascripts to paftools.js. Fixed several bugs in
  these scripts at the same time.

(2.9: 24 February 2018, r720)



Release 2.8-r672 (1 February 2018)
----------------------------------

Notable changes in this release include:

 * Speed up short-read alignment by ~10%. The overall mapping accuracy stays
   the same, but the output alignments are not always identical to v2.7 due to
   unstable sorting employed during chaining. Long-read alignment is not
   affected by this change as the speedup is short-read specific.

 * Mappy now supports paired-end short-read alignment (#87). Please see
   python/README.rst for details.

 * Added option --for-only and --rev-only to perform alignment against the
   forward or the reverse strand of the reference genome only (#91).

 * Alleviated the issue with undesired diagonal alignment in the self mapping
   mode (#10). Even if the output is not ideal, it should not interfere with
   other alignments. Fully resolving the issue is intricate and may require
   additional heuristic thresholds.

 * Enhanced error checking against incorrect input (#92 and #96).

For long query sequences, minimap2 should output identical alignments to v2.7.

(2.8: 1 February 2018, r672)



Release 2.7-r654 (9 January 2018)
---------------------------------

This release fixed a bug in the splice mode and added a few minor features:

 * Fixed a bug that occasionally takes an intron as a long deletion in the
   splice mode. This was caused by wrong backtracking at the last CIGAR
   operator. The current fix eliminates the error, but it is not optimal in
   that it often produces a wrong junction when the last operator is an intron.
   A future version of minimap2 may improve upon this.

 * Support high-end ARM CPUs that implement the NEON instruction set (#81).
   This enables minimap2 to work on Raspberry Pi 3 and Odroid XU4.

 * Added a C API to construct a minimizer index from a set of C strings (#80).

 * Check scoring specified on the command line (#79). Due to the 8-bit limit,
   excessively large score penalties fail minimap2.

For genomic sequences, minimap2 should give identical alignments to v2.6.

(2.7: 9 January 2018, r654)



Release 2.6-r623 (12 December 2017)
-----------------------------------

This release adds several features and fixes two minor bugs:

 * Optionally build an index without sequences. This helps to reduce the
   peak memory for read overlapping and is automatically applied when
   base-level alignment is not requested.

 * Approximately estimate per-base sequence divergence (i.e. 1-identity)
   without performing base-level alignment, using a MashMap-like method. The
   estimate is written to a new dv:f tag.

 * Reduced the number of tiny terminal exons in RNA-seq alignment. The current
   setting is conservative. Increase --end-seed-pen to drop more such exons.

 * Reduced the peak memory when aligning long query sequences.

 * Fixed a bug that is caused by HPC minimizers longer than 256bp. This should
   have no effect in practice, but it is recommended to rebuild HPC indices if
   possible.

 * Fixed a bug when identifying identical hits (#71). This should only affect
   artifactual reference consisting of near identical sequences.

For genomic sequences, minimap2 should give nearly identical alignments to
v2.5, except the new dv:f tag.

(2.6: 12 December 2017, r623)



Release 2.5-r572 (11 November 2017)
-----------------------------------

This release fixes several bugs and brings a couple of minor improvements:

 * Fixed a severe bug that leads to incorrect mapping coordinates in rare
   corner cases.

 * Fixed underestimated mapping quality for chimeric alignments when the whole
   query sequence contain many repetitive minimizers, and for chimeric
   alignments caused by Z-drop.

 * Fixed two bugs in Python binding: incorrect strand field (#57) and incorrect
   sequence names for Python3 (#55).

 * Improved mapping accuracy for highly overlapping paired ends.

 * Added option -Y to use soft clipping for supplementary alignments (#56).

(2.5: 11 November 2017, r572)



Release 2.4-r555 (6 November 2017)
----------------------------------

As is planned, this release focuses on fine tuning the base algorithm. Notable
changes include

 * Changed the mapping quality scale to match the scale of BWA-MEM. This makes
   minimap2 and BWA-MEM achieve similar sensitivity-specificity balance on real
   short-read data.

 * Improved the accuracy of splice alignment by modeling one additional base
   close to the GT-AG signal. This model is used by default with `-x splice`.
   For SIRV control data, however, it is recommended to add `--splice-flank=no`
   to disable this feature as the SIRV splice signals are slightly different.

 * Tuned the parameters for Nanopore Direct RNA reads. The recommended command
   line is `-axsplice -k14 -uf` (#46).

 * Fixed a segmentation fault when aligning PacBio reads (#47 and #48). This
   bug is very rare but it affects all versions of minimap2. It is also
   recommended to re-index reference genomes created with `map-pb`. For human,
   two minimizers in an old index are wrong.

 * Changed option `-L` in sync with the final decision of hts-specs: a fake
   CIGAR takes the form of `<readLen>S<refLen>N`. Note that `-L` only enables
   future tools to recognize long CIGARs. It is not possible for older tools to
   work with such alignments in BAM (#43 and #51).

 * Fixed a tiny issue whereby minimap2 may waste 8 bytes per candidate
   alignment.

The minimap2 technical note hosted at arXiv has also been updated to reflect
recent changes.

(2.4: 6 November 2017, r555)



Release 2.3-r531 (22 October 2017)
----------------------------------

This release come with many improvements and bug fixes:

 * The **sr** preset now supports paired-end short-read alignment. Minimap2 is
   3-4 times as fast as BWA-MEM, but is slightly less accurate on simulated
   reads.

 * Meticulous improvements to assembly-to-assembly alignment (special thanks to
   Alexey Gurevich from the QUAST team): a) apply a small penalty to matches
   between ambiguous bases; b) reduce missing alignments due to spurious
   overlaps; c) introduce the short form of the `cs` tag, an improvement to the
   SAM MD tag.

 * Make sure gaps are always left-aligned.

 * Recognize `U` bases from Oxford Nanopore Direct RNA-seq (#33).

 * Fixed slightly wrong chaining score. Fixed slightly inaccurate coordinates
   for split alignment.

 * Fixed multiple reported bugs: 1) wrong reference name for inversion
   alignment (#30); 2) redundant SQ lines when multiple query files are
   specified (#39); 3) non-functioning option `-K` (#36).

This release has implemented all the major features I planned five months ago,
with the addition of spliced long-read alignment. The next couple of releases
will focus on fine tuning of the base algorithms.

(2.3: 22 October 2017, r531)



Release 2.2-r409 (17 September 2017)
------------------------------------

This is a feature release. It improves single-end short-read alignment and
comes with Python bindings. Detailed changes include:

 * Added the **sr** preset for single-end short-read alignment. In this mode,
   minimap2 runs faster than BWA-MEM, but is slightly less accurate on
   simulated data sets. Paired-end alignment is not supported as of now.

 * Improved mapping quality estimate with more accurate identification of
   repetitive hits. This mainly helps short-read alignment.

 * Implemented **mappy**, a Python binding for minimap2, which is available
   from PyPI and can be installed with `pip install --user mappy`. Python users
   can perform read alignment without the minimap2 executable.

 * Restructured the indexing APIs and documented key minimap2 APIs in the
   header file minimap.h. Updated example.c with the new APIs. Old APIs still
   work but may become deprecated in future.

This release may output alignments different from the previous version, though
the overall alignment statistics, such as the number of aligned bases and long
gaps, remain close.

(2.2: 17 September 2017, r409)



Release 2.1.1-r341 (6 September 2017)
-------------------------------------

This is a maintenance release that is expected to output identical alignment to
v2.1. Detailed changes include:

 * Support CPU dispatch. By default, minimap2 is compiled with both SSE2 and
   SSE4 based implementation of alignment and automatically chooses the right
   one at runtime. This avoids unexpected errors on older CPUs (#21).

 * Improved Windows support as is requested by Oxford Nanopore (#19). Minimap2
   now avoids variable-length stacked arrays, eliminates alloca(), ships with
   getopt_long() and provides timing functions implemented with Windows APIs.

 * Fixed a potential segmentation fault when specifying -k/-w/-H with
   multi-part index (#23).

 * Fixed two memory leaks in example.c

(2.1.1: 6 September 2017, r341)



Release 2.1-r311 (25 August 2017)
---------------------------------

This release adds spliced alignment for long noisy RNA-seq reads. On a SMRT
Iso-Seq and a Oxford Nanopore data sets, minimap2 appears to outperform
traditional mRNA aligners. For DNA alignment, this release gives almost
identical output to v2.0. Other changes include:

 * Added option `-R` to set the read group header line in SAM.

 * Optionally output the `cs:Z` tag in PAF to encode both the query and the
   reference sequences in the alignment.

 * Fixed an issue where DP alignment uses excessive memory.

The minimap2 technical report has been updated with more details and the
evaluation of spliced alignment:

 * Li, H. (2017). Minimap2: fast pairwise alignment for long nucleotide
   sequences. [arXiv:1708.01492v2](https://arxiv.org/abs/1708.01492v2).

(2.1: 25 August 2017, r311)



Release 2.0-r275 (8 August 2017)
--------------------------------

This release is identical to version 2.0rc1, except the version number. It is
described and evaluated in the following technical report:

 * Li, H. (2017). Minimap2: fast pairwise alignment for long DNA sequences.
   [arXiv:1708.01492v1](https://arxiv.org/abs/1708.01492v1).

(2.0: 8 August 2017, r275)



Release 2.0rc1-r232 (30 July 2017)
----------------------------------

This release improves the accuracy of long-read alignment and added several
minor features.

 * Improved mapping quality estimate for short alignments containing few seed
   hits.

 * Fixed a minor bug that affects the chaining accuracy towards the ends of a
   chain. Changed the gap cost for chaining to reduce false seeding.

 * Skip potentially wrong seeding and apply dynamic programming more frequently.
   This slightly increases run time, but greatly reduces false long gaps.

 * Perform local alignment at Z-drop break point to recover potential inversion
   alignment. Output the SA tag in the SAM format. Added scripts to evaluate
   mapping accuracy for reads simulated with pbsim.

This release completes features intended for v2.0. No major features will be
added to the master branch before the final v2.0.

(2.0rc1: 30 July 2017, r232)



Release r191 (19 July 2017)
---------------------------

This is the first public release of minimap2, an aligner for long reads and
assemblies. This release has a few issues and is generally not recommended for
production uses.

(19 July 2017, r191)