1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412
|
=======================================
Release notes for PyTables 2.2 series
=======================================
:Author: Francesc Alted i Abad
:Contact: faltet@pytables.org
Changes from 2.2.1rc1 to 2.2.1
==============================
- The `Row` accessor implements a new `__contains__` special method that
allows doing things like::
for row in table:
if item in row:
print "Value found in row", row.nrow
break
Closes #309.
- PyTables is more friendly with easy_install and pip now, as all the
Python dependencies should be installed automatically. Closes #298.
Changes from 2.2 to 2.2.1rc1
============================
- When using `ObjectAtom` objects in `VLArrays` the ``HIGHEST_PROTOCOL``
is used for pickling objects. For NumPy arrays, this simple change
leads to space savings up to 3x and time improvements up to 30x.
Closes #301.
- tables.Expr can perform operations on scalars now. Thanks to Gaƫtan
de Menten for providing a patch for this. Closes #287.
- Fixed a problem with indexes larger than 32-bit on leaf objects on
32-bit machines. Fixes #283.
- Merged in Blosc 1.1.2 for fixing a problem with large datatypes and
subprocess issues. Closes #288 and #295.
- Due to the adoption of Blosc 1.1.2, the pthreads-win32 library
dependency is dropped on Windows platforms.
- Fixed a problem with tables.Expr and operands with vary large
rowsizes. Closes #300.
- ``leaf[numpy.array[scalar]]`` idiom returns a NumPy array instead of
an scalar. This has been done for compatibility with NumPy. Closes
#303.
- Optimization for `Table.copy()` so that ``FIELD_*`` attrs are not
overwritten during the copy. This can lead to speed-ups up to 100x
for short tables that have hundreds of columns. Closes #304.
- For external links, its relative paths are resolved now with respect
to the directory of the main HDF5 file, rather than with respect to
the current directory. Closes #306.
- ``Expr.setInputsRange()`` and ``Expr.setOutputRange()`` do support
``numpy.integer`` types now. Closes #285.
- Column names in tables can start with '__' now. Closes #291.
- Unicode empty strings are supported now as atributes. Addresses #307.
- Cython 0.13 and higher is supported now. Fixes #293.
- PyTables should be more 'easy_install'-able now. Addresses #298.
Changes from 2.2rc2 to 2.2 (final)
==================================
- Updated Blosc to 1.0 (final).
- Filter ID of Blosc changed from wrong 32010 to reserved 32001. This
will prevent PyTables 2.2 (final) to read files created with Blosc and
PyTables 2.2 pre-final. `ptrepack` can be used to retrieve those
files, if necessary. More info in ticket #281.
- Recent benchmarks suggest a new parametrization is better in most
scenarios:
* The default chunksize has been doubled for every dataset size. This
works better in most of scenarios, specially with the new Blosc
compressor.
* The HDF5 CHUNK_CACHE_SIZE parameter has been raised to 2 MB in order
to better adapt to the chunksize increase. This provides better hit
ratio (at the cost of consuming more memory).
Some plots have been added to the User's Manual (chapter 5) showing
how the new parametrization works.
Changes from 2.2rc1 to 2.2rc2
=============================
- A new version of Blosc (0.9.5) is included. This version is now
considered to be stable and apt for production. Thanks for all
PyTables users that have contributed to find and report bugs.
- Added a new `IO_BUFFER_SIZE` parameter to ``tables/parameters.py``
that allows to set the internal PyTables' buffer for doing I/O. This
replaces `CHUNKTIMES` but it is more general because it affects to all
`Leaf` objects and also the `tables.Expr` module (and not only tables
as before).
- `BUFFERTIMES` parameter in ``tables/parameters.py`` has been
renamed to `BUFFER_TIMES` which is more consistent with other
parameter names.
- On Windows platforms, the path to the tables module is now appended to
sys.path and the PATH environment variable. That way DLLs and PYDs in
the tables directory are to be found now. Thanks to Christoph Gohlke
for the hint.
- A replacement for barriers for Mac OSX, or other systems not
implementing them, has been carried out. This allows to compile
PyTables on such platforms. Fixes #278
- Fixed a couple of warts that raise compatibility warnings with
forthcoming Python 2.7.
- HDF5 1.8.5 is used in Windows binaries.
Changes from 2.2b3 to 2.2rc1
============================
- Numexpr is not included anymore in PyTables and has become a requisite
instead. This is because Numexpr already has decent enough installers
and is available in the PyPI repository also, so it should be easy for
users to fulfill this dependency.
- When using a Numexpr package that is turbo-loaded with Intel's
VML/MKL, the parameter `MAX_THREADS` will control the number of
threads that VML can use during computations. For a finer control,
the `numexpr.set_vml_num_threads()` can always be used.
- Cython is used now instead of Pyrex for Pyrex extensions.
- Updated to 0.9 version of Blosc compressor. This version can make use
of threads so as to accelerate the compression/decompression process.
In order to change the maximum number of threads that Blosc can use (2
by default), you can modify the `MAX_THREADS` variable in
``tables/parameters.py`` or make use of the new `setBloscMaxThreads()`
global function.
- Reopening already opened files is supported now, provided that there is
not incompatibility among intended usages (for example, you cannot
reopen in append mode an already opened file in read-only mode).
- Option ``--print-versions`` for ``test_all.py`` script is now
preferred over the deprecated ``--show-versions``. This is more
consistent with the existing `print_versions()` function.
- Fixed a bug that, under some circumstances, prevented the use of table
iterators in `itertool.groupby()`. Now, you can safely do things
like::
sel_rows = table.where('(row_id >= 3)')
for group_id, grouped_rows in itertools.groupby(sel_rows, f_group):
group_mean = average([row['row_id'] for row in grouped_rows])
Fixes #264.
- Copies of `Array` objects with multidimensional atoms (coming from
native HDF5 files) work correctly now (i.e. the copy holds the atom
dimensionality). Fixes #275.
- The `tables.openFile()` function does not try anymore to open/close
the file in order to guess whether it is a HDF5 or PyTables one before
opening it definitely. This allows the `fcntl.flock()` and
`fcntl.lockf()` Python functions to work correctly now (that's useful
for arbitrating access to the file by different processes). Thanks to
Dag Sverre Seljebotn and Ivan Vilata for their suggestions on hunting
this one! Fixes #185.
- The estimation of the chunksize when using multidimensional atoms in
EArray/Carray was wrong because it did not take in account the shape
of the atom. Thanks to Ralf Juengling for reporting. Fixes #273.
- Non-contiguous arrays can now safely be saved as attributes. Before,
if arrays were not contiguous, incorrect data was saved in attr.
Fixes #270.
- EXTDIM attribute for CArray/EArray now saves the correct extendeable
dimension, instead of rubbish. This does not affected functionality,
because extendeable dimension was retrieved directly from shape
information, but it was providing misleading information to the user.
Fixes #268.
API changes
-----------
- Now, `Table.Cols.__len__()` returns the number of top level columns
instead of the number of rows in table. This is more consistent in
that `Table.Cols` is an accessor for *columns*. Fixes #276.
Changes from 2.2b2 to 2.2b3
===========================
- Blosc compressor has been added as an additional filter, in addition
to the existing Zlib, LZO and bzip2. This new compressor is meant for
fast compression and extremely fast decompression. Fixes #265.
- In `File.copyFile()` method, `copyuserattrs` was set to false as
default. This was unconsistent with other methods where the default
value for `copyuserattrs` is true. The default for this is true now.
Closes #261.
- `tables.copyFile` and `File.copyFile` recognize now the parameters
present in ``tables/parameters.py``. Fixes #262.
- Backported fix for issue #25 in Numexpr (OP_NEG_LL treats the argument
as an int, not a long long). Thanks to David Cooke for this.
- CHUNK_CACHE_NELMTS in `tables/paramters.py` set to a prime number as
Neil Fortner suggested.
- Workaround for a problem in Python 2.6.4 (and probably other versions
too) for pickling strings like "0" or "0.". Fixes #253.
Changes from 2.2b1 to 2.2b2
===========================
Enhancements
------------
- Support for HDF5 hard links, soft links and external links (when
PyTables is compiled against HDF5 1.8.x series). A new tutorial about
its usage has been added to the 'Tutorials' chapter of User's Manual.
Closes #239 and #247.
- Added support for setting HDF5 chunk cache parameters in file
opening/creating time. 'CHUNK_CACHE_NELMTS', 'CHUNK_CACHE_PREEMPT'
and 'CHUNK_CACHE_SIZE' are the new parameters. See "PyTables'
parameter files" appendix in User's Manual for more info. Closes
#221.
- New `Unknown` class added so that objects that HDF5 identifies as
``H5G_UNKNOWN`` can be mapped to it and continue operations
gracefully.
- Added flag `--dont-create-sysattrs` to ``ptrepack`` so as to not
create sys attrs (default is to do it).
- Support for native compound types in attributes. This allows for
better compatibility with HDF5 files. Closes #208.
- Support for native NumPy dtype in the description parameter of
`File.createTable()`. Closes #238.
Bugs fixed
----------
- Added missing `_c_classId` attribute to the `UnImplemented` class.
``ptrepack`` no longer chokes while copying `Unimplemented` classes.
- The ``FIELD_*`` sys attrs are no longer copied when the
``PYTABLES_SYS_ATTRS`` parameter is set to false.
- `File.createTable()` no longer segfaults if description=None. Closes
#248.
- Workaround for avoiding a Python issue causing a segfault when saving
and then retrieving a string attribute with values "0" or "0.".
Closes #253.
API changes
-----------
- `Row.__contains__()` disabled because it has little sense to query for
a key in Row, and the correct way should be to query for it in
`Table.colnames` or `Table.colpathnames` better. Closes #241.
- [Semantic change] To avoid a common pitfall when asking for the string
representation of a `Row` class, `Row.__str__()` has been redefined.
Now, it prints something like::
>>> for row in table:
... print row
...
/newgroup/table.row (Row), pointing to row #0
/newgroup/table.row (Row), pointing to row #1
/newgroup/table.row (Row), pointing to row #2
instead of::
>>> for row in table:
... print row
...
('Particle: 0', 0, 10, 0.0, 0.0)
('Particle: 1', 1, 9, 1.0, 1.0)
('Particle: 2', 2, 8, 4.0, 4.0)
Use `print row[:]` idiom if you want to reproduce the old behaviour.
Closes #252.
Other changes
-------------
- After some improvements in both HDF5 and PyTables, the limit before
emitting a `PerformanceWarning` on the number of children in a group
has been raised from 4096 to 16384.
Changes from 2.1.1 to 2.2b1
===========================
Enhancements
------------
- Added `Expr`, a class for evaluating expressions containing
array-like objects. It can evaluate expressions (like '3*a+4*b')
that operate on arbitrary large arrays while optimizing the
resources (basically main memory and CPU cache memory) required to
perform them. It is similar to the Numexpr package, but in addition
to NumPy objects, it also accepts disk-based homogeneous arrays,
like the `Array`, `CArray`, `EArray` and `Column` PyTables objects.
- Added support for NumPy's extended slicing in all `Leaf` objects.
With that, you can do the next sort of selections::
array1 = array[4] # simple selection
array2 = array[4:1000:2] # slice selection
array3 = array[1, ..., ::2, 1:4, 4:] # general slice selection
array4 = array[1, [1,5,10], ..., -1] # fancy selection
array5 = array[np.where(array[:] > 4)] # point selection
array6 = array[array[:] > 4] # boolean selection
Thanks to Andrew Collette for implementing this for h5py, from which
it has been backported. Closes #198 and #209.
- Numexpr updated to 1.3.1. This can lead to up a 25% improvement of
the time for both in-kernel and indexed queries for unaligned
tables.
- HDF5 1.8.3 supported.
Bugs fixed
----------
- Fixed problems when modifying multidimensional columns in Table
objects. Closes #228.
- Row attribute is no longer stalled after a table move or rename.
Fixes #224.
- Array.__getitem__(scalar) returns a NumPy scalar now, instead of a
0-dim NumPy array. This should not be noticed by normal users,
unless they check for the type of returned value. Fixes #222.
API changes
-----------
- Added a `dtype` attribute for all leaves. This is the NumPy
``dtype`` that most closely matches the leaf type. This allows for
a quick-and-dirty check of leaf types. Closes #230.
- Added a `shape` attribute for `Column` objects. This is formed by
concatenating the length of the column and the shape of its type.
Also, the representation of columns has changed an now includes the
length of the column as the leading dimension. Closes #231.
- Added a new `maindim` attribute for `Column` which has the 0 value
(the leading dimension). This allows for a better similarity with
other \*Array objects.
- In order to be consistent and allow the extended slicing to happen
in `VLArray` objects too, `VLArray.__setitem__()` is not able to
partially modify rows based on the second dimension passed as key.
If this is tried, an `IndexError` is raised now. Closes #210.
- The `forceCSI` flag has been replaced by `checkCSI` in the next
`Table` methods: `copy()`, `readSorted()` and `itersorted()`. The
change reflects the fact that a re-index operation cannot be
triggered from these methods anymore. The rational for the change
is that an indexing operation is a potentially very expensive
operation that should be carried out explicitly instead of being
triggered by methods that should not be in charge of this task.
Closes #216.
Backward incompatible changes
-----------------------------
- After the introduction of the `shape` attribute for `Column`
objects, the shape information for multidimensional columns has been
removed from the `dtype` attribute (it is set to the base type of
the column now). Closes #232.
**Enjoy data!**
-- The PyTables Team
.. Local Variables:
.. mode: rst
.. coding: utf-8
.. fill-column: 72
.. End:
|