File: CHANGELOG.md

package info (click to toggle)
pbbam 0.7.4+ds-1
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 10,384 kB
  • ctags: 5,236
  • sloc: cpp: 48,068; python: 1,444; xml: 852; ansic: 820; makefile: 175; sh: 52; cs: 12
file content (396 lines) | stat: -rw-r--r-- 14,950 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
# PacBio::BAM - change log

All notable changes to this project will be documented in this file.
This project adheres to [Semantic Versioning](http://semver.org/). 

**NOTE:** The current series (0.y.z) is under initial development. Anything may
change at any time. The public API should not be considered stable yet. Once we
lock down a version 1.0.0, this will define a reference point & compatibility
guarantees will be maintained within each major version series.

## Active

### Added
- Default DataSet 'Version' attribute if none already present (currently 4.0.0)

## [0.7.4] - 2016-11-18

### Changed
- Compatibility for merging BAM files no longer requires exact match of PacBioBAM
version number (header @HD:pb tag). As long as both files meet the minimum 
supported version number, the merge is allowed.

## [0.7.3] - 2016-11-11

### Added
- Support for S/P2-C2 chemistry and forthcoming 4.0 basecaller

## [0.7.2] - 2016-11-10

### Removed
- SAM header version equality check for merging BAM files. PacBioBAM version 
number carries more meaning for PacBio data and thus will be the basis of 
ensuring compatible merging.

## [0.7.1] - 2016-11-09

### Added
- (Unindexed) FASTA reader & FastaSequence data structure.
- Missing unit tests for internal BAM tag access.
- Chemistry data for basecaller v3.3.
- Missing parsers for filtering barcode quality ("bq"), barcode forward ("bcf"), 
and barcode reverse ("bcr") from DataSetXML.
- Integrated htslib into project.

### Fixed
- Reverse complement on padding base.

## [0.7.0] - 2016-09-26 

### Added
- Clipping for CCS records

### Fixed
- Cached position data leaking across records while iterating.
- Rolled back default pulse behavior in internal BAM API, to be backward-
compatible with existing client code (for now at least). v0.6.0 introduced
returning basecalled positions ONLY by default, rather than return ALL 
pulses. 
- Fixed crash when attempting to read from empty BAM/PBI files using the 
PbiFilter-enabled APIs.

## [0.6.0] - 2016-09-13

### Added
- BamWriter writes to a BAM file with the target name plus a ".tmp" suffix. On
successful completion (i.e. normal BamWriter destruction, not triggered by a
thrown exception) the file is renamed to the actual requested filename.
- PBI file creation follows the same temporary naming convention.
- Support for barcode pair (forward, reverse) in DataSetXML filter.
- Validation API & 'auto-validate' compile-time switch. 
- Added support for a batched QNAME whitelist filter in DataSet XML. Uses (new) 
Property name 'qname_file', with the value being the filepath containing the 
whitelist.
- Exposed MD5 hashing to API.
- Ability to remove base features from a ReadGroupInfo object.
- Can construct an aggregate PbiRawData index object from a DataSet: essentially
concatenates all PBI data within the dataset.
- New SamWriter class to create SAM-formatted output of PacBio BAM data.
- Extended APIs for accessing "internal BAM" data, including PulseBehavior
switch for selecting between all pulses & basecalls only. 

### Fixed
- Improper 'clip to reference' product for BamRecord in some cases.
- Improper behavior in tag accessors (e.g. BamRecord::IPD()) on reverse strand-
aligned reads (bug 31339).
- Improper basecaller version parsing in ReadGroupInfo.

### Changed
- RecordType::POLYMERASE renamed to RecordType::ZMW to reflect changes in
PacBio BAM spec v3.0.4
- Refactored the 'virtual' reader classes - to match the new nomenclature,
and to combine the virtual reader & composite readers behind a shared 
interface. The old class names still exist, as typedefs to the new ones, 
and the interfaces are completely source-compatible - so as not to break 
existing code. However, the old classes should be considered deprecated and 
the new ones preferred. Below is the mapping of old -> new:

   VirtualPolymeraseBamRecord        ->  VirtualZmwBamRecord
   VirtualPolymeraseReader           ->  ZmwReadStitcher
   VirtualPolymeraseCompositeReader  ->  ZmwReadStitcher
   ZmwWhitelistVirtualReader         ->  WhitelistedZmwReadStitcher


## [0.5.0] - 2016-02-22

### Added
- Platform model tag added to read group as RG::PM
- New scrap zmw type sz
- pbmerge accepts DataSetXML as input - using top-level resource BAMs as input,
applying filters, and generating a merged BAM. Also added FOFN support, instead
of listing out BAMs as command line args.
- PbiLocalContextFilter to allow filtering on subread local context.
- PbiBuilder: multithreading & zlib compression-level tuning for PBI output

### Fixed
- Fixed mishandling of relative BAM filenames in the filename constructor for
DataSet (e.g. DataSet ds("../data.bam")).

## [0.4.5] - 2016-01-14

### Changed
- PbiFilterQuery (and any other PBI-backed query, e.g. ZmwQuery ) now throws if
PBI file(s) missing insted of returning empty result.
- GenomicIntervalQuery now throws if BAI file(s) missing instead of returning
empty result.
- BamFile will throw if file is truncated (e.g. missing the EOF block). Disable
by defining PBBAM_NO_CHECK_EOF .

## [0.4.4] - 2016-01-07

### Added
- bam2sam command line utility. The primary benefit is removing the dependency
on samtools during tests, but also provides users a functioning BAM -> SAM
converter in the absence of samtools.
- pbmerge command line utility. Allows merging N BAM files into one, optionally
creating the PBI file alongside.
- Added BamRecord::Pkmean2 & Pkmid2, 2D equivalent of Pkmean/Pkmid, for internal
BAMs.

### Removed 
- samtools dependency

## [0.4.3] - 2015-12-22

### Added
- Compile using ccache by default, if available. Can be manually disabled using
-DPacBioBAM_use_ccache=OFF with cmake.
- pbindexdump: command-line utility that converts PBI file data into human-
readable formats. (JSON by default).

### Changed
- CMake option PacBioBAM_build_pbindex is being deprecated. Use
PacBioBAM_build_tools instead.

## [0.4.2] - 2015-12-22

### Changed
- BamFile::PacBioIndexExists & StandardIndexExists no longer check timestamps.
Copying/moving files around can yield timestamps that are not helpful (no longer
guaranteed that the .pbi will be "newer" than the .bam, even though no content
changed). Added methods (e.g. bool BamFile::PacBioIndexIsNewer()) to do that
lookup if needed, but it is no longer done automatically.

## [0.4.1] - 2015-12-18

### Added
- BamRecord::HasNumPasses

### Changed
- VirtualPolymeraseBamRecord::VirtualRegionsTable(type) returns an empty vector
of regions if none are associated with the requested type, instead of throwing.

## [0.4.0] - 2015-12-15

### Changed
- Redesigned PbiFilter interface and backend. Previous implementation did not
scale well as intermediate results were far too unwieldy. This redesign provides
speedups of orders of magnitude in many cases.

## [0.3.2] - 2015-12-10

### Added 
- Support for ReadGroupInfo sequencing chemistry data.
InvalidSequencingChemistryException thrown if an unsupported combination is
encountered.
- VirtualPolymeraseCompositeReader - for re-stitching records, across multiple
resources (e.g. from DataSetXML). Reader respects DataSet filter criteria.

## [0.3.1] - 2015-10-30

### Added
- ZmwWhitelistVirtualReader: similar to VirtualPolymeraseReader but restricts
iteration to a whitelist of ZMW hole numbers, leveraging PBI index data for
random-access.

### Fixed
- Fixed error in PBI construction, in which entire file sections (e.g.
BarcodeData or MappedData) where being dropped when any one record lacked data.
Correct behavior is to allow file section ommission if all records lack that
data type.

## [0.3.0] - 2015-10-29

### Fixed
- Improper reporting of current offset from multi-threaded BamWriter. This had
the effect of creating broken PBIs that were written alongside the BAM. Added a
flush step, which incurs a performance hit, but restores correctness.

## [0.2.4] - 2015-10-26

### Fixed
- Empty PbiFilter now returns all records, instead of filtering away all records.

## [0.2.3] - 2015-10-26

### Added/Fixed
- Syncing DataSetXML across APIs. Primary changes include output of Version
attribute ("3.0.1") on appropriate elements, as well as resolution of namespace
issues.

## [0.2.2] - 2015-10-22

### Added
- Added BAI bin calculation to BamWriter::Write, to ensure maximal compatibility
with downstream tools (e.g. 'samtools index'). A new BinCalculationMode enum
flag in BamWriter constructor cotnrols whether this behavior is enabled[default]
or not.

## [0.2.1] - 2015-10-19

### Added
- Exposed the following classes to public API:
  - BamReader
  - BaiIndexedBamReader
  - PbiIndexedBamReader
  - GenomicIntervalCompositeBamReader
  - PbiFilterCompositeBamReader

## [0.2.0] - 2015-10-09

### Changed
- BAM spec v3.0.1 compliance. Previous (betas) versions of the BAM spec are not
supported and will causean exception to be throw if encountered.
- PBI lookup interface & backend, see PbiIndex.h & PbiLookupData.h for details.

### Added 
- BamFile::PacBioIndexExists() & BamFile::StandardIndexExists() - query the
existence of index files without auto-building them if they are missing, as in
BamFile::Ensure*IndexExists().
- GenomicInterval now accepts an htslib/samtools-style REGION string in the
constructor: GenomicInterval("chr1:1000-2000"). Please note though, that pbbam
uses 0-based coordinates throughout, whereas samtools expects 1-based. The above
string is equivalent to "chr1:1001-2000" in samtools.
- Built-in PBI filters. See PbiFlter.h & PbiFilterTypes.h for built-in filters
and constructing composite filters. These can be used in conjunction with the
new PbiFilterQuery, which takes a generic PbiFilter and applies that to a
DataSet for iteration.
- New built-in queries: BarcodeQuery, ReadAccuracyQuery, SubreadLengthQuery.
These leverage the new filter API to construct a PbiFilter and apply to a
DataSet.
- Built-in BamRecord comparators that are STL-compatible. See Compare.h for full
list. This allows for statements like the following, which sorts records by ZMW
number:
``` c++
    vector<BamRecord> data;
    std::sort(data.begin(), data.end(), Compare::Zmw());
```
- "exciseSoftClips" option to BamRecord::CigarData()

## [0.1.0] - 2015-07-17

### Changed
- BAM spec v3.0b7 compliance
 - Removal of 'M' as allowed CIGAR operation. Attempt to use such a CIGAR op
 will throw an exception.
 - Addition of IPD/PulseWidth codec version info in header
  
### Added
- Auto-generation of UTC timestamp for DataSet objects
- PbiBuilder - allows generation of PBI index data alongside generation or
modification of BAM record data. This obviates the need to wait for a completed
BAM, then go through the zlib decompression, etc.
- Added DataSet::FromXml(string xml) to create DataSets from "raw" XML string,
rather than building up using DataSet API or loading from existing file.
- "pbindex" command line tool to generate ".pbi" files from BAM data. The
executable is built by default, but can be disabled using the cmake option
"-DPacBioBAM_build_pbindex=OFF".
  
### Fixed
- PBI construction failing on CCS reads

## [0.0.8] - 2015-07-02

### Changed
- Build system refactoring.

## [0.0.7] - 2015-07-02

### Added
- PBI index lookup API. Not so much intended for client use directly, but will
enable construction of higher-level semantic queries: grouping by, filtering,
etc.
- DataSet & PBI-aware queries (e.g. ZmwGroupQuery). More PBI-enabled queries to
follow.
- More flexibility in tag access. Samtools has a habit of performing a
"shrink-to-fit" when it handles integer-valued tag data. Thus we cannot
**guarantee** the binary type that our API will have to process. Safe
conversions are allowed on integer-like data only. Under- or overflows in
casting will trigger an exception. All other tag data types must be asked for
explicitly, or else an exception will be raised, as before.
- BamHeader::DeepCopy - allows creation of editable header data, without
overwriting all shared instances

### Fixed
- XSD compliance for DataSet APIs.

### Changed
- The functionality provided by ZmwQuery (group by hole number), is now
available using the ZmwGroupQuery object. The new ZmwQuery returns a single-
record iterator (a la EntireFileQuery), but limited to a whitelist of requested
hole numbers.

### Removed
- XSD non-compliant classes (e.g. ExternalDataReference)

## [0.0.6] - 2015-06-07

### Added

- Accessor methods for pulse bam support:
 - LabelQV()
 - AltLabelQV()
 - LabelTag()
 - AltLabelTag()
 - Pkmean()
 - Pkmid()
 - PrePulseFrames() only RC, no clipping
 - PulseCallWidth() only RC, no clipping
 - PulseCall() case-sensitive RC, no clipping
 - IPDRaw() to avoid up and downscaling for stitching
- BamRecord::ParseTagName and BamRecord::ParseTagString to convert a two 
  character tag string to a TagName enum and back. Allows a switch over tags.
- VirtualPolymeraseReader to create VirtualPolymeraseBamRecord from a 
  subreads|hqregion+scraps.bam
- VirtualRegion represents annotations of the polymerase reads, for adapters, 
  barcodes, lqregions, and hqregions.
- ReadGroupInfo operator== 

### Fixed

- Reimplemented QueryStart(int), QueryEnd(int), UpdateName(void), 
  ReadGroup(ReadGroupInfo&), ReadGroupId(std::string&);

## [0.0.5] - 2015-05-29

### Added

- DataSet support. This includes XML I/O, basic dataset query/manipulation, and
multi-BAM-file queries. New classes are located in <pbbam/dataset/>. DataSet-
capable queries currently reside in the PacBio::BAM::staging namespace. These
will be ported over to the main namespace once the support is stabilized and
works seamlessly with either a single BamFile or DataSet object as input. (bug
25941)
- PBI support. This includes read/write raw data & building from a BamFile. The
lookup API for random-access queries is under development, but the raw data is
available - for creating PBI files & generating summary statistics. (bug 26025)
- C# SWIG bindings, alongside existing Python and R wrappers.
- LocalContextFlags support in BamRecord (bug 26623)

### Fixed

- BamRecord[Impl] map quality now  initialized with 255 (missing) value, instead
of 0. (bug 26228)
- ReadGroupId calculation. (bug 25940)
  
## [0.0.4] - 2015-04-22

### Added

- This changelog. Hope it helps.
- Hook to set verbosity of underlying htslib warnings.
- Grouped queries. (bug 26361)

### Changed

- Now using exceptions instead of return codes, output parameters, etc.
- Removed "messy" shared_ptrs across interface (see especially BamHeader). These
are now taken care of within the API, not exposed to client code.

### Removed

- BamReader 

### Fixed

- ASCII tag output. (bug 26381)