File: RELEASE_NOTES_v2.2.x.rst

package info (click to toggle)
pytables 3.3.0-5
links: PTS, VCS
area: main
in suites: stretch
size: 14,972 kB
ctags: 16,919
sloc: python: 59,339; ansic: 46,596; cpp: 1,463; sh: 476; makefile: 428
file content (412 lines) | stat: -rw-r--r-- 15,105 bytes
parent folder | download | duplicates (4)
=======================================
 Release notes for PyTables 2.2 series
=======================================

:Author: Francesc Alted i Abad
:Contact: faltet@pytables.org


Changes from 2.2.1rc1 to 2.2.1
==============================

- The `Row` accessor implements a new `__contains__` special method that
  allows doing things like::

    for row in table:
        if item in row:
            print "Value found in row", row.nrow
            break

  Closes #309.

- PyTables is more friendly with easy_install and pip now, as all the
  Python dependencies should be installed automatically.  Closes #298.


Changes from 2.2 to 2.2.1rc1
============================

- When using `ObjectAtom` objects in `VLArrays` the ``HIGHEST_PROTOCOL``
  is used for pickling objects.  For NumPy arrays, this simple change
  leads to space savings up to 3x and time improvements up to 30x.
  Closes #301.

- tables.Expr can perform operations on scalars now.  Thanks to Gaëtan
  de Menten for providing a patch for this.  Closes #287.

- Fixed a problem with indexes larger than 32-bit on leaf objects on
  32-bit machines.  Fixes #283.

- Merged in Blosc 1.1.2 for fixing a problem with large datatypes and
  subprocess issues.  Closes #288 and #295.

- Due to the adoption of Blosc 1.1.2, the pthreads-win32 library
  dependency is dropped on Windows platforms.

- Fixed a problem with tables.Expr and operands with vary large
  rowsizes. Closes #300.

- ``leaf[numpy.array[scalar]]`` idiom returns a NumPy array instead of
  an scalar.  This has been done for compatibility with NumPy.  Closes
  #303.

- Optimization for `Table.copy()` so that ``FIELD_*`` attrs are not
  overwritten during the copy.  This can lead to speed-ups up to 100x
  for short tables that have hundreds of columns.  Closes #304.

- For external links, its relative paths are resolved now with respect
  to the directory of the main HDF5 file, rather than with respect to
  the current directory.  Closes #306.

- ``Expr.setInputsRange()`` and ``Expr.setOutputRange()`` do support
  ``numpy.integer`` types now.  Closes #285.

- Column names in tables can start with '__' now.  Closes #291.

- Unicode empty strings are supported now as atributes.  Addresses #307.

- Cython 0.13 and higher is supported now.  Fixes #293.

- PyTables should be more 'easy_install'-able now.  Addresses #298.


Changes from 2.2rc2 to 2.2 (final)
==================================

- Updated Blosc to 1.0 (final).

- Filter ID of Blosc changed from wrong 32010 to reserved 32001.  This
  will prevent PyTables 2.2 (final) to read files created with Blosc and
  PyTables 2.2 pre-final.  `ptrepack` can be used to retrieve those
  files, if necessary.  More info in ticket #281.

- Recent benchmarks suggest a new parametrization is better in most
  scenarios:

  * The default chunksize has been doubled for every dataset size.  This
    works better in most of scenarios, specially with the new Blosc
    compressor.

  * The HDF5 CHUNK_CACHE_SIZE parameter has been raised to 2 MB in order
    to better adapt to the chunksize increase.  This provides better hit
    ratio (at the cost of consuming more memory).

  Some plots have been added to the User's Manual (chapter 5) showing
  how the new parametrization works.


Changes from 2.2rc1 to 2.2rc2
=============================

- A new version of Blosc (0.9.5) is included.  This version is now
  considered to be stable and apt for production.  Thanks for all
  PyTables users that have contributed to find and report bugs.

- Added a new `IO_BUFFER_SIZE` parameter to ``tables/parameters.py``
  that allows to set the internal PyTables' buffer for doing I/O.  This
  replaces `CHUNKTIMES` but it is more general because it affects to all
  `Leaf` objects and also the `tables.Expr` module (and not only tables
  as before).

- `BUFFERTIMES` parameter in ``tables/parameters.py`` has been
  renamed to `BUFFER_TIMES` which is more consistent with other
  parameter names.

- On Windows platforms, the path to the tables module is now appended to
  sys.path and the PATH environment variable. That way DLLs and PYDs in
  the tables directory are to be found now.  Thanks to Christoph Gohlke
  for the hint.

- A replacement for barriers for Mac OSX, or other systems not
  implementing them, has been carried out.  This allows to compile
  PyTables on such platforms.  Fixes #278

- Fixed a couple of warts that raise compatibility warnings with
  forthcoming Python 2.7.

-  HDF5 1.8.5 is used in Windows binaries.

Changes from 2.2b3 to 2.2rc1
============================

- Numexpr is not included anymore in PyTables and has become a requisite
  instead.  This is because Numexpr already has decent enough installers
  and is available in the PyPI repository also, so it should be easy for
  users to fulfill this dependency.

- When using a Numexpr package that is turbo-loaded with Intel's
  VML/MKL, the parameter `MAX_THREADS` will control the number of
  threads that VML can use during computations.  For a finer control,
  the `numexpr.set_vml_num_threads()` can always be used.

- Cython is used now instead of Pyrex for Pyrex extensions.

- Updated to 0.9 version of Blosc compressor.  This version can make use
  of threads so as to accelerate the compression/decompression process.
  In order to change the maximum number of threads that Blosc can use (2
  by default), you can modify the `MAX_THREADS` variable in
  ``tables/parameters.py`` or make use of the new `setBloscMaxThreads()`
  global function.

- Reopening already opened files is supported now, provided that there is
  not incompatibility among intended usages (for example, you cannot
  reopen in append mode an already opened file in read-only mode).

- Option ``--print-versions`` for ``test_all.py`` script is now
  preferred over the deprecated ``--show-versions``.  This is more
  consistent with the existing `print_versions()` function.

- Fixed a bug that, under some circumstances, prevented the use of table
  iterators in `itertool.groupby()`.  Now, you can safely do things
  like::

    sel_rows = table.where('(row_id >= 3)')
    for group_id, grouped_rows in itertools.groupby(sel_rows, f_group):
        group_mean = average([row['row_id'] for row in grouped_rows])

  Fixes #264.

- Copies of `Array` objects with multidimensional atoms (coming from
  native HDF5 files) work correctly now (i.e. the copy holds the atom
  dimensionality).  Fixes #275.

- The `tables.openFile()` function does not try anymore to open/close
  the file in order to guess whether it is a HDF5 or PyTables one before
  opening it definitely.  This allows the `fcntl.flock()` and
  `fcntl.lockf()` Python functions to work correctly now (that's useful
  for arbitrating access to the file by different processes).  Thanks to
  Dag Sverre Seljebotn and Ivan Vilata for their suggestions on hunting
  this one!  Fixes #185.

- The estimation of the chunksize when using multidimensional atoms in
  EArray/Carray was wrong because it did not take in account the shape
  of the atom.  Thanks to Ralf Juengling for reporting.  Fixes #273.

- Non-contiguous arrays can now safely be saved as attributes.  Before,
  if arrays were not contiguous, incorrect data was saved in attr.
  Fixes #270.

- EXTDIM attribute for CArray/EArray now saves the correct extendeable
  dimension, instead of rubbish.  This does not affected functionality,
  because extendeable dimension was retrieved directly from shape
  information, but it was providing misleading information to the user.
  Fixes #268.

API changes
-----------

- Now, `Table.Cols.__len__()` returns the number of top level columns
  instead of the number of rows in table.  This is more consistent in
  that `Table.Cols` is an accessor for *columns*.  Fixes #276.


Changes from 2.2b2 to 2.2b3
===========================

- Blosc compressor has been added as an additional filter, in addition
  to the existing Zlib, LZO and bzip2.  This new compressor is meant for
  fast compression and extremely fast decompression.  Fixes #265.

- In `File.copyFile()` method, `copyuserattrs` was set to false as
  default.  This was unconsistent with other methods where the default
  value for `copyuserattrs` is true.  The default for this is true now.
  Closes #261.

- `tables.copyFile` and `File.copyFile` recognize now the parameters
  present in ``tables/parameters.py``.  Fixes #262.

- Backported fix for issue #25 in Numexpr (OP_NEG_LL treats the argument
  as an int, not a long long).  Thanks to David Cooke for this.

- CHUNK_CACHE_NELMTS in `tables/paramters.py` set to a prime number as
  Neil Fortner suggested.

- Workaround for a problem in Python 2.6.4 (and probably other versions
  too) for pickling strings like "0" or "0.".  Fixes #253.


Changes from 2.2b1 to 2.2b2
===========================

Enhancements
------------

- Support for HDF5 hard links, soft links and external links (when
  PyTables is compiled against HDF5 1.8.x series).  A new tutorial about
  its usage has been added to the 'Tutorials' chapter of User's Manual.
  Closes #239 and #247.

- Added support for setting HDF5 chunk cache parameters in file
  opening/creating time.  'CHUNK_CACHE_NELMTS', 'CHUNK_CACHE_PREEMPT'
  and 'CHUNK_CACHE_SIZE' are the new parameters.  See "PyTables'
  parameter files" appendix in User's Manual for more info.  Closes
  #221.

- New `Unknown` class added so that objects that HDF5 identifies as
  ``H5G_UNKNOWN`` can be mapped to it and continue operations
  gracefully.

- Added flag `--dont-create-sysattrs` to ``ptrepack`` so as to not
  create sys attrs (default is to do it).

- Support for native compound types in attributes.  This allows for
  better compatibility with HDF5 files.  Closes #208.

- Support for native NumPy dtype in the description parameter of
  `File.createTable()`.  Closes #238.


Bugs fixed
----------

- Added missing `_c_classId` attribute to the `UnImplemented` class.
  ``ptrepack`` no longer chokes while copying `Unimplemented` classes.

- The ``FIELD_*`` sys attrs are no longer copied when the
  ``PYTABLES_SYS_ATTRS`` parameter is set to false.

- `File.createTable()` no longer segfaults if description=None.  Closes
  #248.

- Workaround for avoiding a Python issue causing a segfault when saving
  and then retrieving a string attribute with values "0" or "0.".
  Closes #253.


API changes
-----------

- `Row.__contains__()` disabled because it has little sense to query for
  a key in Row, and the correct way should be to query for it in
  `Table.colnames` or `Table.colpathnames` better.  Closes #241.

- [Semantic change] To avoid a common pitfall when asking for the string
  representation of a `Row` class, `Row.__str__()` has been redefined.
  Now, it prints something like::

      >>> for row in table:
      ...     print row
      ...
      /newgroup/table.row (Row), pointing to row #0
      /newgroup/table.row (Row), pointing to row #1
      /newgroup/table.row (Row), pointing to row #2

  instead of::

      >>> for row in table:
      ...     print row
      ...
      ('Particle:      0', 0, 10, 0.0, 0.0)
      ('Particle:      1', 1, 9, 1.0, 1.0)
      ('Particle:      2', 2, 8, 4.0, 4.0)

  Use `print row[:]` idiom if you want to reproduce the old behaviour.
  Closes #252.


Other changes
-------------

- After some improvements in both HDF5 and PyTables, the limit before
  emitting a `PerformanceWarning` on the number of children in a group
  has been raised from 4096 to 16384.


Changes from 2.1.1 to 2.2b1
===========================

Enhancements
------------

- Added `Expr`, a class for evaluating expressions containing
  array-like objects.  It can evaluate expressions (like '3*a+4*b')
  that operate on arbitrary large arrays while optimizing the
  resources (basically main memory and CPU cache memory) required to
  perform them.  It is similar to the Numexpr package, but in addition
  to NumPy objects, it also accepts disk-based homogeneous arrays,
  like the `Array`, `CArray`, `EArray` and `Column` PyTables objects.

- Added support for NumPy's extended slicing in all `Leaf` objects.
  With that, you can do the next sort of selections::

      array1 = array[4]                       # simple selection
      array2 = array[4:1000:2]                # slice selection
      array3 = array[1, ..., ::2, 1:4, 4:]    # general slice selection
      array4 = array[1, [1,5,10], ..., -1]    # fancy selection
      array5 = array[np.where(array[:] > 4)]  # point selection
      array6 = array[array[:] > 4]            # boolean selection

  Thanks to Andrew Collette for implementing this for h5py, from which
  it has been backported.  Closes #198 and #209.

- Numexpr updated to 1.3.1.  This can lead to up a 25% improvement of
  the time for both in-kernel and indexed queries for unaligned
  tables.

- HDF5 1.8.3 supported.


Bugs fixed
----------

- Fixed problems when modifying multidimensional columns in Table
  objects.  Closes #228.

- Row attribute is no longer stalled after a table move or rename.
  Fixes #224.

- Array.__getitem__(scalar) returns a NumPy scalar now, instead of a
  0-dim NumPy array.  This should not be noticed by normal users,
  unless they check for the type of returned value.  Fixes #222.


API changes
-----------

- Added a `dtype` attribute for all leaves.  This is the NumPy
  ``dtype`` that most closely matches the leaf type.  This allows for
  a quick-and-dirty check of leaf types.  Closes #230.

- Added a `shape` attribute for `Column` objects.  This is formed by
  concatenating the length of the column and the shape of its type.
  Also, the representation of columns has changed an now includes the
  length of the column as the leading dimension.  Closes #231.

- Added a new `maindim` attribute for `Column` which has the 0 value
  (the leading dimension).  This allows for a better similarity with
  other \*Array objects.

- In order to be consistent and allow the extended slicing to happen
  in `VLArray` objects too, `VLArray.__setitem__()` is not able to
  partially modify rows based on the second dimension passed as key.
  If this is tried, an `IndexError` is raised now.  Closes #210.

- The `forceCSI` flag has been replaced by `checkCSI` in the next
  `Table` methods: `copy()`, `readSorted()` and `itersorted()`.  The
  change reflects the fact that a re-index operation cannot be
  triggered from these methods anymore.  The rational for the change
  is that an indexing operation is a potentially very expensive
  operation that should be carried out explicitly instead of being
  triggered by methods that should not be in charge of this task.
  Closes #216.


Backward incompatible changes
-----------------------------

- After the introduction of the `shape` attribute for `Column`
  objects, the shape information for multidimensional columns has been
  removed from the `dtype` attribute (it is set to the base type of
  the column now).  Closes #232.


  **Enjoy data!**

  -- The PyTables Team


.. Local Variables:
.. mode: rst
.. coding: utf-8
.. fill-column: 72
.. End: