File: v0.18.1.rst

package info (click to toggle)
pandas 2.2.3%2Bdfsg-9
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 66,784 kB
sloc: python: 422,228; ansic: 9,190; sh: 270; xml: 102; makefile: 83
file content (844 lines) | stat: -rw-r--r-- 33,199 bytes
.. _whatsnew_0181:

Version 0.18.1 (May 3, 2016)
----------------------------

{{ header }}


This is a minor bug-fix release from 0.18.0 and includes a large number of
bug fixes along with several new features, enhancements, and performance improvements.
We recommend that all users upgrade to this version.

Highlights include:

- ``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see :ref:`here <whatsnew_0181.deferred_ops>`
- ``pd.to_datetime()`` has gained the ability to assemble dates from a ``DataFrame``, see :ref:`here <whatsnew_0181.enhancements.assembling>`
- Method chaining improvements, see :ref:`here <whatsnew_0181.enhancements.method_chain>`.
- Custom business hour offset, see :ref:`here <whatsnew_0181.enhancements.custombusinesshour>`.
- Many bug fixes in the handling of ``sparse``, see :ref:`here <whatsnew_0181.sparse>`
- Expanded the :ref:`Tutorials section <tutorial-modern>` with a feature on modern pandas, courtesy of `@TomAugsburger <https://twitter.com/TomAugspurger>`__. (:issue:`13045`).


.. contents:: What's new in v0.18.1
    :local:
    :backlinks: none

.. _whatsnew_0181.new_features:

New features
~~~~~~~~~~~~

.. _whatsnew_0181.enhancements.custombusinesshour:

Custom business hour
^^^^^^^^^^^^^^^^^^^^

The ``CustomBusinessHour`` is a mixture of ``BusinessHour`` and ``CustomBusinessDay`` which
allows you to specify arbitrary holidays. For details,
see :ref:`Custom Business Hour <timeseries.custombusinesshour>` (:issue:`11514`)

.. ipython:: python

    from pandas.tseries.offsets import CustomBusinessHour
    from pandas.tseries.holiday import USFederalHolidayCalendar

    bhour_us = CustomBusinessHour(calendar=USFederalHolidayCalendar())

Friday before MLK Day

.. ipython:: python

    import datetime

    dt = datetime.datetime(2014, 1, 17, 15)

    dt + bhour_us

Tuesday after MLK Day (Monday is skipped because it's a holiday)

.. ipython:: python

    dt + bhour_us * 2

.. _whatsnew_0181.deferred_ops:

Method ``.groupby(..)`` syntax with window and resample operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see (:issue:`12486`, :issue:`12738`).

You can now use ``.rolling(..)`` and ``.expanding(..)`` as methods on groupbys. These return another deferred object (similar to what ``.rolling()`` and ``.expanding()`` do on ungrouped pandas objects). You can then operate on these ``RollingGroupby`` objects in a similar manner.

Previously you would have to do this to get a rolling window mean per-group:

.. ipython:: python

   df = pd.DataFrame({"A": [1] * 20 + [2] * 12 + [3] * 8, "B": np.arange(40)})
   df

.. code-block:: ipython

   In [1]: df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
   Out[1]:
   A
   1  0      NaN
      1      NaN
      2      NaN
      3      1.5
      4      2.5
      5      3.5
      6      4.5
      7      5.5
      8      6.5
      9      7.5
      10     8.5
      11     9.5
      12    10.5
      13    11.5
      14    12.5
      15    13.5
      16    14.5
      17    15.5
      18    16.5
      19    17.5
   2  20     NaN
      21     NaN
      22     NaN
      23    21.5
      24    22.5
      25    23.5
      26    24.5
      27    25.5
      28    26.5
      29    27.5
      30    28.5
      31    29.5
   3  32     NaN
      33     NaN
      34     NaN
      35    33.5
      36    34.5
      37    35.5
      38    36.5
      39    37.5
   Name: B, dtype: float64

Now you can do:

.. ipython:: python

   df.groupby("A").rolling(4).B.mean()

For ``.resample(..)`` type of operations, previously you would have to:

.. ipython:: python

   df = pd.DataFrame(
       {
           "date": pd.date_range(start="2016-01-01", periods=4, freq="W"),
           "group": [1, 1, 2, 2],
           "val": [5, 6, 7, 8],
       }
   ).set_index("date")

   df

.. code-block:: ipython

   In[1]: df.groupby("group").apply(lambda x: x.resample("1D").ffill())
   Out[1]:
                     group  val
   group date
   1     2016-01-03      1    5
         2016-01-04      1    5
         2016-01-05      1    5
         2016-01-06      1    5
         2016-01-07      1    5
         2016-01-08      1    5
         2016-01-09      1    5
         2016-01-10      1    6
   2     2016-01-17      2    7
         2016-01-18      2    7
         2016-01-19      2    7
         2016-01-20      2    7
         2016-01-21      2    7
         2016-01-22      2    7
         2016-01-23      2    7
         2016-01-24      2    8

Now you can do:

.. code-block:: ipython

   In[1]: df.groupby("group").resample("1D").ffill()
   Out[1]:
                     group  val
   group date
   1     2016-01-03      1    5
         2016-01-04      1    5
         2016-01-05      1    5
         2016-01-06      1    5
         2016-01-07      1    5
         2016-01-08      1    5
         2016-01-09      1    5
         2016-01-10      1    6
   2     2016-01-17      2    7
         2016-01-18      2    7
         2016-01-19      2    7
         2016-01-20      2    7
         2016-01-21      2    7
         2016-01-22      2    7
         2016-01-23      2    7
         2016-01-24      2    8

.. _whatsnew_0181.enhancements.method_chain:

Method chaining improvements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following methods / indexers now accept a ``callable``. It is intended to make
these more useful in method chains, see the :ref:`documentation <indexing.callable>`.
(:issue:`11485`, :issue:`12533`)

- ``.where()`` and ``.mask()``
- ``.loc[]``, ``iloc[]`` and ``.ix[]``
- ``[]`` indexing

Methods ``.where()`` and ``.mask()``
""""""""""""""""""""""""""""""""""""

These can accept a callable for the condition and ``other``
arguments.

.. ipython:: python

   df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]})
   df.where(lambda x: x > 4, lambda x: x + 10)

Methods ``.loc[]``, ``.iloc[]``, ``.ix[]``
""""""""""""""""""""""""""""""""""""""""""

These can accept a callable, and a tuple of callable as a slicer. The callable
can return a valid boolean indexer or anything which is valid for these indexer's input.

.. ipython:: python

   # callable returns bool indexer
   df.loc[lambda x: x.A >= 2, lambda x: x.sum() > 10]

   # callable returns list of labels
   df.loc[lambda x: [1, 2], lambda x: ["A", "B"]]

Indexing with ``[]``
""""""""""""""""""""

Finally, you can use a callable in ``[]`` indexing of Series, DataFrame and Panel.
The callable must return a valid input for ``[]`` indexing depending on its
class and index type.

.. ipython:: python

   df[lambda x: "A"]

Using these methods / indexers, you can chain data selection operations
without using temporary variable.

.. ipython:: python

   bb = pd.read_csv("data/baseball.csv", index_col="id")
   (bb.groupby(["year", "team"]).sum(numeric_only=True).loc[lambda df: df.r > 100])

.. _whatsnew_0181.partial_string_indexing:

Partial string indexing on ``DatetimeIndex`` when part of a ``MultiIndex``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Partial string indexing now matches on ``DateTimeIndex`` when part of a ``MultiIndex`` (:issue:`10331`)

.. code-block:: ipython

   In [20]: dft2 = pd.DataFrame(
      ....:     np.random.randn(20, 1),
      ....:     columns=["A"],
      ....:     index=pd.MultiIndex.from_product(
      ....:         [pd.date_range("20130101", periods=10, freq="12H"), ["a", "b"]]
      ....:     ),
      ....: )
      ....:

   In [21]: dft2
   Out[21]:
                                 A
   2013-01-01 00:00:00 a  0.469112
                       b -0.282863
   2013-01-01 12:00:00 a -1.509059
                       b -1.135632
   2013-01-02 00:00:00 a  1.212112
   ...                         ...
   2013-01-04 12:00:00 b  0.271860
   2013-01-05 00:00:00 a -0.424972
                       b  0.567020
   2013-01-05 12:00:00 a  0.276232
                       b -1.087401

   [20 rows x 1 columns]

   In [22]: dft2.loc["2013-01-05"]
   Out[22]:
                                 A
   2013-01-05 00:00:00 a -0.424972
                       b  0.567020
   2013-01-05 12:00:00 a  0.276232
                       b -1.087401

   [4 rows x 1 columns]

On other levels

.. code-block:: ipython

   In [26]: idx = pd.IndexSlice

   In [27]: dft2 = dft2.swaplevel(0, 1).sort_index()

   In [28]: dft2
   Out[28]:
                                 A
   a 2013-01-01 00:00:00  0.469112
     2013-01-01 12:00:00 -1.509059
     2013-01-02 00:00:00  1.212112
     2013-01-02 12:00:00  0.119209
     2013-01-03 00:00:00 -0.861849
   ...                         ...
   b 2013-01-03 12:00:00  1.071804
     2013-01-04 00:00:00 -0.706771
     2013-01-04 12:00:00  0.271860
     2013-01-05 00:00:00  0.567020
     2013-01-05 12:00:00 -1.087401

   [20 rows x 1 columns]

   In [29]: dft2.loc[idx[:, "2013-01-05"], :]
   Out[29]:
                                 A
   a 2013-01-05 00:00:00 -0.424972
     2013-01-05 12:00:00  0.276232
   b 2013-01-05 00:00:00  0.567020
     2013-01-05 12:00:00 -1.087401

   [4 rows x 1 columns]

.. _whatsnew_0181.enhancements.assembling:

Assembling datetimes
^^^^^^^^^^^^^^^^^^^^

``pd.to_datetime()`` has gained the ability to assemble datetimes from a passed in ``DataFrame`` or a dict. (:issue:`8158`).

.. ipython:: python

   df = pd.DataFrame(
       {"year": [2015, 2016], "month": [2, 3], "day": [4, 5], "hour": [2, 3]}
   )
   df

Assembling using the passed frame.

.. ipython:: python

   pd.to_datetime(df)

You can pass only the columns that you need to assemble.

.. ipython:: python

   pd.to_datetime(df[["year", "month", "day"]])

.. _whatsnew_0181.other:

Other enhancements
^^^^^^^^^^^^^^^^^^

- ``pd.read_csv()`` now supports ``delim_whitespace=True`` for the Python engine (:issue:`12958`)
- ``pd.read_csv()`` now supports opening ZIP files that contains a single CSV, via extension inference or explicit ``compression='zip'`` (:issue:`12175`)
- ``pd.read_csv()`` now supports opening files using xz compression, via extension inference or explicit ``compression='xz'`` is specified; ``xz`` compressions is also supported by ``DataFrame.to_csv`` in the same way (:issue:`11852`)
- ``pd.read_msgpack()`` now always gives writeable ndarrays even when compression is used (:issue:`12359`).
- ``pd.read_msgpack()`` now supports serializing and de-serializing categoricals with msgpack (:issue:`12573`)
- ``.to_json()`` now supports ``NDFrames`` that contain categorical and sparse data (:issue:`10778`)
- ``interpolate()`` now supports ``method='akima'`` (:issue:`7588`).
- ``pd.read_excel()`` now accepts path objects (e.g. ``pathlib.Path``, ``py.path.local``) for the file path, in line with other ``read_*`` functions (:issue:`12655`)
- Added ``.weekday_name`` property as a component to ``DatetimeIndex`` and the ``.dt`` accessor. (:issue:`11128`)

- ``Index.take`` now handles ``allow_fill`` and ``fill_value`` consistently (:issue:`12631`)

  .. ipython:: python

     idx = pd.Index([1.0, 2.0, 3.0, 4.0], dtype="float")

     # default, allow_fill=True, fill_value=None
     idx.take([2, -1])
     idx.take([2, -1], fill_value=True)

- ``Index`` now supports ``.str.get_dummies()`` which returns ``MultiIndex``, see :ref:`Creating Indicator Variables <text.indicator>` (:issue:`10008`, :issue:`10103`)

  .. ipython:: python

     idx = pd.Index(["a|b", "a|c", "b|c"])
     idx.str.get_dummies("|")


- ``pd.crosstab()`` has gained a ``normalize`` argument for normalizing frequency tables (:issue:`12569`). Examples in the updated docs :ref:`here <reshaping.crosstabulations>`.
- ``.resample(..).interpolate()`` is now supported (:issue:`12925`)
- ``.isin()`` now accepts passed ``sets`` (:issue:`12988`)

.. _whatsnew_0181.sparse:

Sparse changes
~~~~~~~~~~~~~~

These changes conform sparse handling to return the correct types and work to make a smoother experience with indexing.

``SparseArray.take`` now returns a scalar for scalar input, ``SparseArray`` for others. Furthermore, it handles a negative indexer with the same rule as ``Index`` (:issue:`10560`, :issue:`12796`)

.. code-block:: python

   s = pd.SparseArray([np.nan, np.nan, 1, 2, 3, np.nan, 4, 5, np.nan, 6])
   s.take(0)
   s.take([1, 2, 3])

- Bug in ``SparseSeries[]`` indexing with ``Ellipsis`` raises ``KeyError`` (:issue:`9467`)
- Bug in ``SparseArray[]`` indexing with tuples are not handled properly (:issue:`12966`)
- Bug in ``SparseSeries.loc[]`` with list-like input raises ``TypeError`` (:issue:`10560`)
- Bug in ``SparseSeries.iloc[]`` with scalar input may raise ``IndexError`` (:issue:`10560`)
- Bug in ``SparseSeries.loc[]``, ``.iloc[]`` with ``slice`` returns ``SparseArray``, rather than ``SparseSeries`` (:issue:`10560`)
- Bug in ``SparseDataFrame.loc[]``, ``.iloc[]`` may results in dense ``Series``, rather than ``SparseSeries`` (:issue:`12787`)
- Bug in ``SparseArray`` addition ignores ``fill_value`` of right hand side (:issue:`12910`)
- Bug in ``SparseArray`` mod raises ``AttributeError`` (:issue:`12910`)
- Bug in ``SparseArray`` pow calculates ``1 ** np.nan`` as ``np.nan`` which must be 1 (:issue:`12910`)
- Bug in ``SparseArray`` comparison output may incorrect result or raise ``ValueError`` (:issue:`12971`)
- Bug in ``SparseSeries.__repr__`` raises ``TypeError`` when it is longer than ``max_rows`` (:issue:`10560`)
- Bug in ``SparseSeries.shape`` ignores ``fill_value`` (:issue:`10452`)
- Bug in ``SparseSeries`` and ``SparseArray`` may have different ``dtype`` from its dense values (:issue:`12908`)
- Bug in ``SparseSeries.reindex`` incorrectly handle ``fill_value`` (:issue:`12797`)
- Bug in ``SparseArray.to_frame()`` results in ``DataFrame``, rather than ``SparseDataFrame`` (:issue:`9850`)
- Bug in ``SparseSeries.value_counts()`` does not count ``fill_value`` (:issue:`6749`)
- Bug in ``SparseArray.to_dense()`` does not preserve ``dtype`` (:issue:`10648`)
- Bug in ``SparseArray.to_dense()`` incorrectly handle ``fill_value`` (:issue:`12797`)
- Bug in ``pd.concat()`` of ``SparseSeries`` results in dense (:issue:`10536`)
- Bug in ``pd.concat()`` of ``SparseDataFrame`` incorrectly handle ``fill_value`` (:issue:`9765`)
- Bug in ``pd.concat()`` of ``SparseDataFrame`` may raise ``AttributeError`` (:issue:`12174`)
- Bug in ``SparseArray.shift()`` may raise ``NameError`` or ``TypeError`` (:issue:`12908`)

.. _whatsnew_0181.api:

API changes
~~~~~~~~~~~

.. _whatsnew_0181.api.groubynth:

Method ``.groupby(..).nth()`` changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The index in ``.groupby(..).nth()`` output is now more consistent when the ``as_index`` argument is passed (:issue:`11039`):

.. ipython:: python

   df = pd.DataFrame({"A": ["a", "b", "a"], "B": [1, 2, 3]})
   df

Previous behavior:

.. code-block:: ipython

   In [3]: df.groupby('A', as_index=True)['B'].nth(0)
   Out[3]:
   0    1
   1    2
   Name: B, dtype: int64

   In [4]: df.groupby('A', as_index=False)['B'].nth(0)
   Out[4]:
   0    1
   1    2
   Name: B, dtype: int64

New behavior:

.. ipython:: python

    df.groupby("A", as_index=True)["B"].nth(0)
    df.groupby("A", as_index=False)["B"].nth(0)

Furthermore, previously, a ``.groupby`` would always sort, regardless if ``sort=False`` was passed with ``.nth()``.

.. ipython:: python

   np.random.seed(1234)
   df = pd.DataFrame(np.random.randn(100, 2), columns=["a", "b"])
   df["c"] = np.random.randint(0, 4, 100)

Previous behavior:

.. code-block:: ipython

   In [4]: df.groupby('c', sort=True).nth(1)
   Out[4]:
             a         b
   c
   0 -0.334077  0.002118
   1  0.036142 -2.074978
   2 -0.720589  0.887163
   3  0.859588 -0.636524

   In [5]: df.groupby('c', sort=False).nth(1)
   Out[5]:
             a         b
   c
   0 -0.334077  0.002118
   1  0.036142 -2.074978
   2 -0.720589  0.887163
   3  0.859588 -0.636524

New behavior:

.. ipython:: python

   df.groupby("c", sort=True).nth(1)
   df.groupby("c", sort=False).nth(1)


.. _whatsnew_0181.numpy_compatibility:

NumPy function compatibility
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Compatibility between pandas array-like methods (e.g. ``sum`` and ``take``) and their ``numpy``
counterparts has been greatly increased by augmenting the signatures of the ``pandas`` methods so
as to accept arguments that can be passed in from ``numpy``, even if they are not necessarily
used in the ``pandas`` implementation (:issue:`12644`, :issue:`12638`, :issue:`12687`)

- ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`)
- Bug in numpy compatibility of ``np.round()`` on a ``Series`` (:issue:`12600`)

An example of this signature augmentation is illustrated below:

.. code-block:: python

   sp = pd.SparseDataFrame([1, 2, 3])
   sp

Previous behaviour:

.. code-block:: ipython

   In [2]: np.cumsum(sp, axis=0)
   ...
   TypeError: cumsum() takes at most 2 arguments (4 given)

New behaviour:

.. code-block:: python

   np.cumsum(sp, axis=0)

.. _whatsnew_0181.apply_resample:

Using ``.apply`` on GroupBy resampling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Using ``apply`` on resampling groupby operations (using a ``pd.TimeGrouper``) now has the same output types as similar ``apply`` calls on other groupby operations. (:issue:`11742`).

.. ipython:: python

    df = pd.DataFrame(
        {"date": pd.to_datetime(["10/10/2000", "11/10/2000"]), "value": [10, 13]}
    )
    df

Previous behavior:

.. code-block:: ipython

    In [1]: df.groupby(pd.TimeGrouper(key='date',
       ...:                           freq='M')).apply(lambda x: x.value.sum())
    Out[1]:
    ...
    TypeError: cannot concatenate a non-NDFrame object

    # Output is a Series
    In [2]: df.groupby(pd.TimeGrouper(key='date',
       ...:                           freq='M')).apply(lambda x: x[['value']].sum())
    Out[2]:
    date
    2000-10-31  value    10
    2000-11-30  value    13
    dtype: int64

New behavior:

.. code-block:: ipython

    # Output is a Series
    In [55]: df.groupby(pd.TimeGrouper(key='date',
        ...:                           freq='M')).apply(lambda x: x.value.sum())
    Out[55]:
    date
    2000-10-31    10
    2000-11-30    13
    Freq: M, dtype: int64

    # Output is a DataFrame
    In [56]: df.groupby(pd.TimeGrouper(key='date',
        ...:                           freq='M')).apply(lambda x: x[['value']].sum())
    Out[56]:
                value
    date
    2000-10-31     10
    2000-11-30     13

.. _whatsnew_0181.read_csv_exceptions:

Changes in ``read_csv`` exceptions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


In order to standardize the ``read_csv`` API for both the ``c`` and ``python`` engines, both will now raise an
``EmptyDataError``, a subclass of ``ValueError``, in response to empty columns or header (:issue:`12493`, :issue:`12506`)

Previous behaviour:

.. code-block:: ipython

   In [1]: import io

   In [2]: df = pd.read_csv(io.StringIO(''), engine='c')
   ...
   ValueError: No columns to parse from file

   In [3]: df = pd.read_csv(io.StringIO(''), engine='python')
   ...
   StopIteration

New behaviour:

.. code-block:: ipython

   In [1]: df = pd.read_csv(io.StringIO(''), engine='c')
   ...
   pandas.io.common.EmptyDataError: No columns to parse from file

   In [2]: df = pd.read_csv(io.StringIO(''), engine='python')
   ...
   pandas.io.common.EmptyDataError: No columns to parse from file

In addition to this error change, several others have been made as well:

- ``CParserError`` now sub-classes ``ValueError`` instead of just a ``Exception`` (:issue:`12551`)
- A ``CParserError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine cannot parse a column (:issue:`12506`)
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine encounters a ``NaN`` value in an integer column (:issue:`12506`)
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when ``true_values`` is specified, and the ``c`` engine encounters an element in a column containing unencodable bytes (:issue:`12506`)
- ``pandas.parser.OverflowError`` exception has been removed and has been replaced with Python's built-in ``OverflowError`` exception (:issue:`12506`)
- ``pd.read_csv()`` no longer allows a combination of strings and integers for the ``usecols`` parameter (:issue:`12678`)


.. _whatsnew_0181.api.to_datetime:

Method ``to_datetime`` error changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Bugs in ``pd.to_datetime()`` when passing a ``unit`` with convertible entries and ``errors='coerce'`` or non-convertible with ``errors='ignore'``. Furthermore, an ``OutOfBoundsDateime`` exception will be raised when an out-of-range value is encountered for that unit when ``errors='raise'``. (:issue:`11758`, :issue:`13052`, :issue:`13059`)

Previous behaviour:

.. code-block:: ipython

   In [27]: pd.to_datetime(1420043460, unit='s', errors='coerce')
   Out[27]: NaT

   In [28]: pd.to_datetime(11111111, unit='D', errors='ignore')
   OverflowError: Python int too large to convert to C long

   In [29]: pd.to_datetime(11111111, unit='D', errors='raise')
   OverflowError: Python int too large to convert to C long

New behaviour:

.. code-block:: ipython

   In [2]: pd.to_datetime(1420043460, unit='s', errors='coerce')
   Out[2]: Timestamp('2014-12-31 16:31:00')

   In [3]: pd.to_datetime(11111111, unit='D', errors='ignore')
   Out[3]: 11111111

   In [4]: pd.to_datetime(11111111, unit='D', errors='raise')
   OutOfBoundsDatetime: cannot convert input with unit 'D'

.. _whatsnew_0181.api.other:

Other API changes
^^^^^^^^^^^^^^^^^

- ``.swaplevel()`` for ``Series``, ``DataFrame``, ``Panel``, and ``MultiIndex`` now features defaults for its first two parameters ``i`` and ``j`` that swap the two innermost levels of the index. (:issue:`12934`)
- ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`)
- ``Period`` and ``PeriodIndex`` now raises ``IncompatibleFrequency`` error which inherits ``ValueError`` rather than raw ``ValueError`` (:issue:`12615`)
- ``Series.apply`` for category dtype now applies the passed function to each of the ``.categories`` (and not the ``.codes``), and returns a ``category`` dtype if possible (:issue:`12473`)
- ``read_csv`` will now raise a ``TypeError`` if ``parse_dates`` is neither a boolean, list, or dictionary (matches the doc-string) (:issue:`5636`)
- The default for ``.query()/.eval()`` is now ``engine=None``, which will use ``numexpr`` if it's installed; otherwise it will fallback to the ``python`` engine. This mimics the pre-0.18.1 behavior if ``numexpr`` is installed (and which, previously, if numexpr was not installed, ``.query()/.eval()`` would raise). (:issue:`12749`)
- ``pd.show_versions()`` now includes ``pandas_datareader`` version (:issue:`12740`)
- Provide a proper ``__name__`` and ``__qualname__`` attributes for generic functions (:issue:`12021`)
- ``pd.concat(ignore_index=True)`` now uses ``RangeIndex`` as default (:issue:`12695`)
- ``pd.merge()`` and ``DataFrame.join()`` will show a ``UserWarning`` when merging/joining a single- with a multi-leveled dataframe (:issue:`9455`, :issue:`12219`)
- Compat with ``scipy`` > 0.17 for deprecated ``piecewise_polynomial`` interpolation method; support for the replacement ``from_derivatives`` method (:issue:`12887`)

.. _whatsnew_0181.deprecations:

Deprecations
^^^^^^^^^^^^

- The method name ``Index.sym_diff()`` is deprecated and can be replaced by ``Index.symmetric_difference()`` (:issue:`12591`)
- The method name ``Categorical.sort()`` is deprecated in favor of ``Categorical.sort_values()`` (:issue:`12882`)








.. _whatsnew_0181.performance:

Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Improved speed of SAS reader (:issue:`12656`, :issue:`12961`)
- Performance improvements in ``.groupby(..).cumcount()`` (:issue:`11039`)
- Improved memory usage in ``pd.read_csv()`` when using ``skiprows=an_integer`` (:issue:`13005`)
- Improved performance of ``DataFrame.to_sql`` when checking case sensitivity for tables. Now only checks if table has been created correctly when table name is not lower case. (:issue:`12876`)
- Improved performance of ``Period`` construction and time series plotting (:issue:`12903`, :issue:`11831`).
- Improved performance of ``.str.encode()`` and ``.str.decode()`` methods (:issue:`13008`)
- Improved performance of ``to_numeric`` if input is numeric dtype (:issue:`12777`)
- Improved performance of sparse arithmetic with ``IntIndex`` (:issue:`13036`)








.. _whatsnew_0181.bug_fixes:

Bug fixes
~~~~~~~~~
- ``usecols`` parameter in ``pd.read_csv`` is now respected even when the lines of a CSV file are not even (:issue:`12203`)
- Bug in ``groupby.transform(..)`` when ``axis=1`` is specified with a non-monotonic ordered index (:issue:`12713`)
- Bug in ``Period`` and ``PeriodIndex`` creation raises ``KeyError`` if ``freq="Minute"`` is specified. Note that "Minute" freq is deprecated in v0.17.0, and recommended to use ``freq="T"`` instead (:issue:`11854`)
- Bug in ``.resample(...).count()`` with a ``PeriodIndex`` always raising a ``TypeError`` (:issue:`12774`)
- Bug in ``.resample(...)`` with a ``PeriodIndex`` casting to a ``DatetimeIndex`` when empty (:issue:`12868`)
- Bug in ``.resample(...)`` with a ``PeriodIndex`` when resampling to an existing frequency (:issue:`12770`)
- Bug in printing data which contains ``Period`` with different ``freq`` raises ``ValueError`` (:issue:`12615`)
- Bug in ``Series`` construction with ``Categorical`` and ``dtype='category'`` is specified (:issue:`12574`)
- Bugs in concatenation with a coercible dtype was too aggressive, resulting in different dtypes in output formatting when an object was longer than ``display.max_rows`` (:issue:`12411`, :issue:`12045`, :issue:`11594`, :issue:`10571`, :issue:`12211`)
- Bug in ``float_format`` option with option not being validated as a callable. (:issue:`12706`)
- Bug in ``GroupBy.filter`` when ``dropna=False`` and no groups fulfilled the criteria (:issue:`12768`)
- Bug in ``__name__`` of ``.cum*`` functions (:issue:`12021`)
- Bug in ``.astype()`` of a ``Float64Inde/Int64Index`` to an ``Int64Index`` (:issue:`12881`)
- Bug in round tripping an integer based index in ``.to_json()/.read_json()`` when ``orient='index'`` (the default) (:issue:`12866`)
- Bug in plotting ``Categorical`` dtypes cause error when attempting stacked bar plot (:issue:`13019`)
- Compat with >= ``numpy`` 1.11 for ``NaT`` comparisons (:issue:`12969`)
- Bug in ``.drop()`` with a non-unique ``MultiIndex``. (:issue:`12701`)
- Bug in ``.concat`` of datetime tz-aware and naive DataFrames (:issue:`12467`)
- Bug in correctly raising a ``ValueError`` in ``.resample(..).fillna(..)`` when passing a non-string (:issue:`12952`)
- Bug fixes in various encoding and header processing issues in ``pd.read_sas()`` (:issue:`12659`, :issue:`12654`, :issue:`12647`, :issue:`12809`)
- Bug in ``pd.crosstab()`` where would silently ignore ``aggfunc`` if ``values=None`` (:issue:`12569`).
- Potential segfault in ``DataFrame.to_json`` when serialising ``datetime.time`` (:issue:`11473`).
- Potential segfault in ``DataFrame.to_json`` when attempting to serialise 0d array (:issue:`11299`).
- Segfault in ``to_json`` when attempting to serialise a ``DataFrame`` or ``Series`` with non-ndarray values; now supports serialization of ``category``, ``sparse``, and ``datetime64[ns, tz]`` dtypes (:issue:`10778`).
- Bug in ``DataFrame.to_json`` with unsupported dtype not passed to default handler (:issue:`12554`).
- Bug in ``.align`` not returning the sub-class (:issue:`12983`)
- Bug in aligning a ``Series`` with a ``DataFrame`` (:issue:`13037`)
- Bug in ``ABCPanel`` in which ``Panel4D`` was not being considered as a valid instance of this generic type (:issue:`12810`)


- Bug in consistency of ``.name`` on ``.groupby(..).apply(..)`` cases (:issue:`12363`)

- Bug in ``Timestamp.__repr__`` that caused ``pprint`` to fail in nested structures (:issue:`12622`)
- Bug in ``Timedelta.min`` and ``Timedelta.max``, the properties now report the true minimum/maximum ``timedeltas`` as recognized by pandas. See the :ref:`documentation <timedeltas.limitations>`. (:issue:`12727`)
- Bug in ``.quantile()`` with interpolation may coerce to ``float`` unexpectedly (:issue:`12772`)
- Bug in ``.quantile()`` with empty ``Series`` may return scalar rather than empty ``Series`` (:issue:`12772`)


- Bug in ``.loc`` with out-of-bounds in a large indexer would raise ``IndexError`` rather than ``KeyError`` (:issue:`12527`)
- Bug in resampling when using a ``TimedeltaIndex`` and ``.asfreq()``, would previously not include the final fencepost (:issue:`12926`)

- Bug in equality testing with a ``Categorical`` in a ``DataFrame`` (:issue:`12564`)
- Bug in ``GroupBy.first()``, ``.last()`` returns incorrect row when ``TimeGrouper`` is used (:issue:`7453`)



- Bug in ``pd.read_csv()`` with the ``c`` engine when specifying ``skiprows`` with newlines in quoted items (:issue:`10911`, :issue:`12775`)
- Bug in ``DataFrame`` timezone lost when assigning tz-aware datetime ``Series`` with alignment (:issue:`12981`)




- Bug in ``.value_counts()`` when ``normalize=True`` and ``dropna=True`` where nulls still contributed to the normalized count (:issue:`12558`)
- Bug in ``Series.value_counts()`` loses name if its dtype is ``category`` (:issue:`12835`)
- Bug in ``Series.value_counts()`` loses timezone info (:issue:`12835`)
- Bug in ``Series.value_counts(normalize=True)`` with ``Categorical`` raises ``UnboundLocalError`` (:issue:`12835`)
- Bug in ``Panel.fillna()`` ignoring ``inplace=True`` (:issue:`12633`)
- Bug in ``pd.read_csv()`` when specifying ``names``, ``usecols``, and ``parse_dates`` simultaneously with the ``c`` engine (:issue:`9755`)
- Bug in ``pd.read_csv()`` when specifying ``delim_whitespace=True`` and ``lineterminator`` simultaneously with the ``c`` engine (:issue:`12912`)
- Bug in ``Series.rename``, ``DataFrame.rename`` and ``DataFrame.rename_axis`` not treating ``Series`` as mappings to relabel (:issue:`12623`).
- Clean in ``.rolling.min`` and ``.rolling.max`` to enhance dtype handling (:issue:`12373`)
- Bug in ``groupby`` where complex types are coerced to float (:issue:`12902`)
- Bug in ``Series.map`` raises ``TypeError`` if its dtype is ``category`` or tz-aware ``datetime`` (:issue:`12473`)

- Bugs on 32bit platforms for some test comparisons (:issue:`12972`)
- Bug in index coercion when falling back from ``RangeIndex`` construction (:issue:`12893`)
- Better error message in window functions when invalid argument (e.g. a float window) is passed (:issue:`12669`)

- Bug in slicing subclassed ``DataFrame`` defined to return subclassed ``Series`` may return normal ``Series`` (:issue:`11559`)


- Bug in ``.str`` accessor methods may raise ``ValueError`` if input has ``name`` and the result is ``DataFrame`` or ``MultiIndex`` (:issue:`12617`)
- Bug in ``DataFrame.last_valid_index()`` and ``DataFrame.first_valid_index()`` on empty frames (:issue:`12800`)


- Bug in ``CategoricalIndex.get_loc`` returns different result from regular ``Index`` (:issue:`12531`)
- Bug in ``PeriodIndex.resample`` where name not propagated (:issue:`12769`)

- Bug in ``date_range`` ``closed`` keyword and timezones (:issue:`12684`).

- Bug in ``pd.concat`` raises ``AttributeError`` when input data contains tz-aware datetime and timedelta (:issue:`12620`)
- Bug in ``pd.concat`` did not handle empty ``Series`` properly (:issue:`11082`)

- Bug in ``.plot.bar`` alignment when ``width`` is specified with ``int`` (:issue:`12979`)


- Bug in ``fill_value`` is ignored if the argument to a binary operator is a constant (:issue:`12723`)

- Bug in ``pd.read_html()`` when using bs4 flavor and parsing table with a header and only one column (:issue:`9178`)

- Bug in ``.pivot_table`` when ``margins=True`` and ``dropna=True`` where nulls still contributed to margin count (:issue:`12577`)
- Bug in ``.pivot_table`` when ``dropna=False`` where table index/column names disappear (:issue:`12133`)
- Bug in ``pd.crosstab()`` when ``margins=True`` and ``dropna=False`` which raised (:issue:`12642`)

- Bug in ``Series.name`` when ``name`` attribute can be a hashable type (:issue:`12610`)

- Bug in ``.describe()`` resets categorical columns information (:issue:`11558`)
- Bug where ``loffset`` argument was not applied when calling ``resample().count()`` on a timeseries (:issue:`12725`)
- ``pd.read_excel()`` now accepts column names associated with keyword argument ``names`` (:issue:`12870`)
- Bug in ``pd.to_numeric()`` with ``Index`` returns ``np.ndarray``, rather than ``Index`` (:issue:`12777`)
- Bug in ``pd.to_numeric()`` with datetime-like may raise ``TypeError`` (:issue:`12777`)
- Bug in ``pd.to_numeric()`` with scalar raises ``ValueError`` (:issue:`12777`)


.. _whatsnew_0.18.1.contributors:

Contributors
~~~~~~~~~~~~

.. contributors:: v0.18.0..v0.18.1