File: v0.14.1.rst

package info (click to toggle)
pandas 2.2.3%2Bdfsg-9
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 66,784 kB
sloc: python: 422,228; ansic: 9,190; sh: 270; xml: 102; makefile: 83
file content (284 lines) | stat: -rw-r--r-- 15,308 bytes
parent folder | download | duplicates (2)
.. _whatsnew_0141:

Version 0.14.1 (July 11, 2014)
------------------------------

{{ header }}


This is a minor release from 0.14.0 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

- Highlights include:

  - New methods :meth:`~pandas.DataFrame.select_dtypes` to select columns
    based on the dtype and :meth:`~pandas.Series.sem` to calculate the
    standard error of the mean.
  - Support for dateutil timezones (see :ref:`docs <timeseries.timezone>`).
  - Support for ignoring full line comments in the :func:`~pandas.read_csv`
    text parser.
  - New documentation section on :ref:`Options and Settings <options>`.
  - Lots of bug fixes.

- :ref:`Enhancements <whatsnew_0141.enhancements>`
- :ref:`API Changes <whatsnew_0141.api>`
- :ref:`Performance Improvements <whatsnew_0141.performance>`
- :ref:`Experimental Changes <whatsnew_0141.experimental>`
- :ref:`Bug Fixes <whatsnew_0141.bug_fixes>`

.. _whatsnew_0141.api:

API changes
~~~~~~~~~~~

- Openpyxl now raises a ValueError on construction of the openpyxl writer
  instead of warning on pandas import (:issue:`7284`).

- For ``StringMethods.extract``, when no match is found, the result - only
  containing ``NaN`` values - now also has ``dtype=object`` instead of
  ``float`` (:issue:`7242`)

- ``Period`` objects no longer raise a ``TypeError`` when compared using ``==``
  with another object that *isn't* a ``Period``. Instead
  when comparing a ``Period`` with another object using ``==`` if the other
  object isn't a ``Period`` ``False`` is returned. (:issue:`7376`)

- Previously, the behaviour on resetting the time or not in
  ``offsets.apply``, ``rollforward`` and ``rollback`` operations differed
  between offsets. With the support of the ``normalize`` keyword for all offsets(see
  below) with a default value of False (preserve time), the behaviour changed for certain
  offsets (BusinessMonthBegin, MonthEnd, BusinessMonthEnd, CustomBusinessMonthEnd,
  BusinessYearBegin, LastWeekOfMonth, FY5253Quarter, LastWeekOfMonth, Easter):

  .. code-block:: ipython

    In [6]: from pandas.tseries import offsets

    In [7]: d = pd.Timestamp('2014-01-01 09:00')

    # old behaviour < 0.14.1
    In [8]: d + offsets.MonthEnd()
    Out[8]: pd.Timestamp('2014-01-31 00:00:00')

  Starting from 0.14.1 all offsets preserve time by default. The old
  behaviour can be obtained with ``normalize=True``

  .. ipython:: python
     :suppress:

     import pandas.tseries.offsets as offsets

     d = pd.Timestamp("2014-01-01 09:00")

  .. ipython:: python

     # new behaviour
     d + offsets.MonthEnd()
     d + offsets.MonthEnd(normalize=True)

  Note that for the other offsets the default behaviour did not change.

- Add back ``#N/A N/A`` as a default NA value in text parsing, (regression from 0.12) (:issue:`5521`)
- Raise a ``TypeError`` on inplace-setting with a ``.where`` and a non ``np.nan`` value as this is inconsistent
  with a set-item expression like ``df[mask] = None`` (:issue:`7656`)


.. _whatsnew_0141.enhancements:

Enhancements
~~~~~~~~~~~~

- Add ``dropna`` argument to ``value_counts`` and ``nunique`` (:issue:`5569`).
- Add :meth:`~pandas.DataFrame.select_dtypes` method to allow selection of
  columns based on dtype (:issue:`7316`). See :ref:`the docs <basics.selectdtypes>`.
- All ``offsets`` supports the ``normalize`` keyword to specify whether
  ``offsets.apply``, ``rollforward`` and ``rollback`` resets the time (hour,
  minute, etc) or not (default ``False``, preserves time) (:issue:`7156`):

  .. code-block:: python

     import pandas.tseries.offsets as offsets

     day = offsets.Day()
     day.apply(pd.Timestamp("2014-01-01 09:00"))

     day = offsets.Day(normalize=True)
     day.apply(pd.Timestamp("2014-01-01 09:00"))

- ``PeriodIndex`` is represented as the same format as ``DatetimeIndex`` (:issue:`7601`)
- ``StringMethods`` now work on empty Series (:issue:`7242`)
- The file parsers ``read_csv`` and ``read_table`` now ignore line comments provided by
  the parameter ``comment``, which accepts only a single character for the C reader.
  In particular, they allow for comments before file data begins (:issue:`2685`)
- Add ``NotImplementedError`` for simultaneous use of ``chunksize`` and ``nrows``
  for read_csv() (:issue:`6774`).
- Tests for basic reading of public S3 buckets now exist (:issue:`7281`).
- ``read_html`` now sports an ``encoding`` argument that is passed to the
  underlying parser library. You can use this to read non-ascii encoded web
  pages (:issue:`7323`).
- ``read_excel`` now supports reading from URLs in the same way
  that ``read_csv`` does.  (:issue:`6809`)
- Support for dateutil timezones, which can now be used in the same way as
  pytz timezones across pandas. (:issue:`4688`)

  .. ipython:: python

     rng = pd.date_range(
         "3/6/2012 00:00", periods=10, freq="D", tz="dateutil/Europe/London"
     )
     rng.tz

  See :ref:`the docs <timeseries.timezone>`.

- Implemented ``sem`` (standard error of the mean) operation for ``Series``,
  ``DataFrame``, ``Panel``, and ``Groupby`` (:issue:`6897`)
- Add ``nlargest`` and ``nsmallest`` to the ``Series`` ``groupby`` allowlist,
  which means you can now use these methods on a ``SeriesGroupBy`` object
  (:issue:`7053`).
- All offsets ``apply``, ``rollforward`` and ``rollback`` can now handle ``np.datetime64``, previously results in ``ApplyTypeError`` (:issue:`7452`)
- ``Period`` and ``PeriodIndex`` can contain ``NaT`` in its values (:issue:`7485`)
- Support pickling ``Series``, ``DataFrame`` and ``Panel`` objects with
  non-unique labels along *item* axis (``index``, ``columns`` and ``items``
  respectively) (:issue:`7370`).
- Improved inference of datetime/timedelta with mixed null objects. Regression from 0.13.1 in interpretation of an object Index
  with all null elements (:issue:`7431`)

.. _whatsnew_0141.performance:

Performance
~~~~~~~~~~~
- Improvements in dtype inference for numeric operations involving yielding performance gains for dtypes: ``int64``, ``timedelta64``, ``datetime64`` (:issue:`7223`)
- Improvements in Series.transform for significant performance gains (:issue:`6496`)
- Improvements in DataFrame.transform with ufuncs and built-in grouper functions for significant performance gains (:issue:`7383`)
- Regression in groupby aggregation of datetime64 dtypes (:issue:`7555`)
- Improvements in ``MultiIndex.from_product`` for large iterables (:issue:`7627`)


.. _whatsnew_0141.experimental:

Experimental
~~~~~~~~~~~~

- ``pandas.io.data.Options`` has a new method, ``get_all_data`` method, and now consistently returns a
  MultiIndexed ``DataFrame`` (:issue:`5602`)
- ``io.gbq.read_gbq`` and ``io.gbq.to_gbq`` were refactored to remove the
  dependency on the Google ``bq.py`` command line client. This submodule
  now uses ``httplib2`` and the Google ``apiclient`` and ``oauth2client`` API client
  libraries which should be more stable and, therefore, reliable than
  ``bq.py``. See :ref:`the docs <io.bigquery>`. (:issue:`6937`).


.. _whatsnew_0141.bug_fixes:

Bug fixes
~~~~~~~~~
- Bug in ``DataFrame.where`` with a symmetric shaped frame and a passed other of a DataFrame (:issue:`7506`)
- Bug in Panel indexing with a MultiIndex axis (:issue:`7516`)
- Regression in datetimelike slice indexing with a duplicated index and non-exact end-points (:issue:`7523`)
- Bug in setitem with list-of-lists and single vs mixed types (:issue:`7551`:)
- Bug in time ops with non-aligned Series (:issue:`7500`)
- Bug in timedelta inference when assigning an incomplete Series (:issue:`7592`)
- Bug in groupby ``.nth`` with a Series and integer-like column name (:issue:`7559`)
- Bug in ``Series.get`` with a boolean accessor (:issue:`7407`)
- Bug in ``value_counts`` where ``NaT`` did not qualify as missing (``NaN``) (:issue:`7423`)
- Bug in ``to_timedelta`` that accepted invalid units and misinterpreted 'm/h' (:issue:`7611`, :issue:`6423`)
- Bug in line plot doesn't set correct ``xlim`` if ``secondary_y=True`` (:issue:`7459`)
- Bug in grouped ``hist`` and ``scatter`` plots use old ``figsize`` default (:issue:`7394`)
- Bug in plotting subplots with ``DataFrame.plot``, ``hist`` clears passed ``ax`` even if the number of subplots is one (:issue:`7391`).
- Bug in plotting subplots with ``DataFrame.boxplot`` with ``by`` kw raises ``ValueError`` if the number of subplots exceeds 1 (:issue:`7391`).
- Bug in subplots displays ``ticklabels`` and ``labels`` in different rule (:issue:`5897`)
- Bug in ``Panel.apply`` with a MultiIndex as an axis (:issue:`7469`)
- Bug in ``DatetimeIndex.insert`` doesn't preserve ``name`` and ``tz`` (:issue:`7299`)
- Bug in ``DatetimeIndex.asobject`` doesn't preserve ``name`` (:issue:`7299`)
- Bug in MultiIndex slicing with datetimelike ranges (strings and Timestamps), (:issue:`7429`)
- Bug in ``Index.min`` and ``max`` doesn't handle ``nan`` and ``NaT`` properly (:issue:`7261`)
- Bug in ``PeriodIndex.min/max`` results in ``int`` (:issue:`7609`)
- Bug in ``resample`` where ``fill_method`` was ignored if you passed ``how`` (:issue:`2073`)
- Bug in ``TimeGrouper`` doesn't exclude column specified by ``key`` (:issue:`7227`)
- Bug in ``DataFrame`` and ``Series`` bar and barh plot raises ``TypeError`` when ``bottom``
  and ``left`` keyword is specified (:issue:`7226`)
- Bug in ``DataFrame.hist`` raises ``TypeError`` when it contains non numeric column (:issue:`7277`)
- Bug in ``Index.delete`` does not preserve ``name`` and ``freq`` attributes (:issue:`7302`)
- Bug in ``DataFrame.query()``/``eval`` where local string variables with the @
  sign were being treated as temporaries attempting to be deleted
  (:issue:`7300`).
- Bug in ``Float64Index`` which didn't allow duplicates (:issue:`7149`).
- Bug in ``DataFrame.replace()`` where truthy values were being replaced
  (:issue:`7140`).
- Bug in ``StringMethods.extract()`` where a single match group Series
  would use the matcher's name instead of the group name (:issue:`7313`).
- Bug in ``isnull()`` when ``mode.use_inf_as_null == True`` where isnull
  wouldn't test ``True`` when it encountered an ``inf``/``-inf``
  (:issue:`7315`).
- Bug in inferred_freq results in None for eastern hemisphere timezones (:issue:`7310`)
- Bug in ``Easter`` returns incorrect date when offset is negative (:issue:`7195`)
- Bug in broadcasting with ``.div``, integer dtypes and divide-by-zero (:issue:`7325`)
- Bug in ``CustomBusinessDay.apply`` raises ``NameError`` when ``np.datetime64`` object is passed (:issue:`7196`)
- Bug in ``MultiIndex.append``, ``concat`` and ``pivot_table`` don't preserve timezone (:issue:`6606`)
- Bug in ``.loc`` with a list of indexers on a single-multi index level (that is not nested) (:issue:`7349`)
- Bug in ``Series.map`` when mapping a dict with tuple keys of different lengths (:issue:`7333`)
- Bug all ``StringMethods`` now work on empty Series (:issue:`7242`)
- Fix delegation of ``read_sql`` to ``read_sql_query`` when query does not contain 'select' (:issue:`7324`).
- Bug where a string column name assignment to a ``DataFrame`` with a
  ``Float64Index`` raised a ``TypeError`` during a call to ``np.isnan``
  (:issue:`7366`).
- Bug where ``NDFrame.replace()`` didn't correctly replace objects with
  ``Period`` values (:issue:`7379`).
- Bug in ``.ix`` getitem should always return a Series (:issue:`7150`)
- Bug in MultiIndex slicing with incomplete indexers (:issue:`7399`)
- Bug in MultiIndex slicing with a step in a sliced level (:issue:`7400`)
- Bug where negative indexers in ``DatetimeIndex`` were not correctly sliced
  (:issue:`7408`)
- Bug where ``NaT`` wasn't repr'd correctly in a ``MultiIndex`` (:issue:`7406`,
  :issue:`7409`).
- Bug where bool objects were converted to ``nan`` in ``convert_objects``
  (:issue:`7416`).
- Bug in ``quantile`` ignoring the axis keyword argument (:issue:`7306`)
- Bug where ``nanops._maybe_null_out`` doesn't work with complex numbers
  (:issue:`7353`)
- Bug in several ``nanops`` functions when ``axis==0`` for
  1-dimensional ``nan`` arrays (:issue:`7354`)
- Bug where ``nanops.nanmedian`` doesn't work when ``axis==None``
  (:issue:`7352`)
- Bug where ``nanops._has_infs`` doesn't work with many dtypes
  (:issue:`7357`)
- Bug in ``StataReader.data`` where reading a 0-observation dta failed (:issue:`7369`)
- Bug in ``StataReader`` when reading Stata 13 (117) files containing fixed width strings (:issue:`7360`)
- Bug in ``StataWriter`` where encoding was ignored (:issue:`7286`)
- Bug in ``DatetimeIndex`` comparison doesn't handle ``NaT`` properly (:issue:`7529`)
- Bug in passing input with ``tzinfo`` to some offsets ``apply``, ``rollforward`` or ``rollback`` resets ``tzinfo`` or raises ``ValueError`` (:issue:`7465`)
- Bug in ``DatetimeIndex.to_period``, ``PeriodIndex.asobject``, ``PeriodIndex.to_timestamp`` doesn't preserve ``name`` (:issue:`7485`)
- Bug in ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` handle ``NaT`` incorrectly (:issue:`7228`)
- Bug in ``offsets.apply``, ``rollforward`` and ``rollback`` may return normal ``datetime`` (:issue:`7502`)
- Bug in ``resample`` raises ``ValueError`` when target contains ``NaT`` (:issue:`7227`)
- Bug in ``Timestamp.tz_localize`` resets ``nanosecond`` info (:issue:`7534`)
- Bug in ``DatetimeIndex.asobject`` raises ``ValueError`` when it contains ``NaT`` (:issue:`7539`)
- Bug in ``Timestamp.__new__`` doesn't preserve nanosecond properly (:issue:`7610`)
- Bug in ``Index.astype(float)`` where it would return an ``object`` dtype
  ``Index`` (:issue:`7464`).
- Bug in ``DataFrame.reset_index`` loses ``tz`` (:issue:`3950`)
- Bug in ``DatetimeIndex.freqstr`` raises ``AttributeError`` when ``freq`` is ``None`` (:issue:`7606`)
- Bug in ``GroupBy.size`` created by ``TimeGrouper`` raises ``AttributeError`` (:issue:`7453`)
- Bug in single column bar plot is misaligned (:issue:`7498`).
- Bug in area plot with tz-aware time series raises ``ValueError`` (:issue:`7471`)
- Bug in non-monotonic ``Index.union`` may preserve ``name`` incorrectly (:issue:`7458`)
- Bug in ``DatetimeIndex.intersection`` doesn't preserve timezone (:issue:`4690`)
- Bug in ``rolling_var`` where a window larger than the array would raise an error(:issue:`7297`)
- Bug with last plotted timeseries dictating ``xlim`` (:issue:`2960`)
- Bug with ``secondary_y`` axis not being considered for timeseries ``xlim`` (:issue:`3490`)
- Bug in ``Float64Index`` assignment with a non scalar indexer (:issue:`7586`)
- Bug in ``pandas.core.strings.str_contains`` does not properly match in a case insensitive fashion when ``regex=False`` and ``case=False`` (:issue:`7505`)
- Bug in ``expanding_cov``, ``expanding_corr``, ``rolling_cov``, and ``rolling_corr`` for two arguments with mismatched index  (:issue:`7512`)
- Bug in ``to_sql`` taking the boolean column as text column (:issue:`7678`)
- Bug in grouped ``hist`` doesn't handle ``rot`` kw and ``sharex`` kw properly (:issue:`7234`)
- Bug in ``.loc`` performing fallback integer indexing with ``object`` dtype indices (:issue:`7496`)
- Bug (regression) in ``PeriodIndex`` constructor when passed ``Series`` objects (:issue:`7701`).


.. _whatsnew_0.14.1.contributors:

Contributors
~~~~~~~~~~~~

.. contributors:: v0.14.0..v0.14.1