File: v0.9.1.rst

package info (click to toggle)
pandas 2.2.3%2Bdfsg-9
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 66,784 kB
sloc: python: 422,228; ansic: 9,190; sh: 270; xml: 102; makefile: 83
file content (171 lines) | stat: -rw-r--r-- 5,040 bytes
parent folder | download | duplicates (2)
.. _whatsnew_0901:

Version 0.9.1 (November 14, 2012)
---------------------------------

{{ header }}


This is a bug fix release from 0.9.0 and includes several new features and
enhancements along with a large number of bug fixes. The new features include
by-column sort order for DataFrame and Series, improved NA handling for the rank
method, masking functions for DataFrame, and intraday time-series filtering for
DataFrame.

New features
~~~~~~~~~~~~

  - ``Series.sort``, ``DataFrame.sort``, and ``DataFrame.sort_index`` can now be
    specified in a per-column manner to support multiple sort orders (:issue:`928`)

    .. code-block:: ipython

       In [2]: df = pd.DataFrame(np.random.randint(0, 2, (6, 3)),
          ...:                   columns=['A', 'B', 'C'])

       In [3]: df.sort(['A', 'B'], ascending=[1, 0])

       Out[3]:
          A  B  C
       3  0  1  1
       4  0  1  1
       2  0  0  1
       0  1  0  0
       1  1  0  0
       5  1  0  0

  - ``DataFrame.rank`` now supports additional argument values for the
    ``na_option`` parameter so missing values can be assigned either the largest
    or the smallest rank (:issue:`1508`, :issue:`2159`)

    .. ipython:: python

        df = pd.DataFrame(np.random.randn(6, 3), columns=['A', 'B', 'C'])

        df.loc[2:4] = np.nan

        df.rank()

        df.rank(na_option='top')

        df.rank(na_option='bottom')


  - DataFrame has new ``where`` and ``mask`` methods to select values according to a
    given boolean mask (:issue:`2109`, :issue:`2151`)

        DataFrame currently supports slicing via a boolean vector the same length as the DataFrame (inside the ``[]``).
        The returned DataFrame has the same number of columns as the original, but is sliced on its index.

        .. ipython:: python

            df = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'])

            df

            df[df['A'] > 0]

        If a DataFrame is sliced with a DataFrame based boolean condition (with the same size as the original DataFrame),
        then a DataFrame the same size (index and columns) as the original is returned, with
        elements that do not meet the boolean condition as ``NaN``. This is accomplished via
        the new method ``DataFrame.where``. In addition, ``where`` takes an optional ``other`` argument for replacement.

        .. ipython:: python

           df[df > 0]

           df.where(df > 0)

           df.where(df > 0, -df)

        Furthermore, ``where`` now aligns the input boolean condition (ndarray or DataFrame), such that partial selection
        with setting is possible. This is analogous to partial setting via ``.ix`` (but on the contents rather than the axis labels)

        .. ipython:: python

           df2 = df.copy()
           df2[df2[1:4] > 0] = 3
           df2

        ``DataFrame.mask`` is the inverse boolean operation of ``where``.

        .. ipython:: python

           df.mask(df <= 0)

  - Enable referencing of Excel columns by their column names (:issue:`1936`)

    .. code-block:: ipython

       In [1]: xl = pd.ExcelFile('data/test.xls')

       In [2]: xl.parse('Sheet1', index_col=0, parse_dates=True,
                        parse_cols='A:D')


  - Added option to disable pandas-style tick locators and formatters
    using ``series.plot(x_compat=True)`` or ``pandas.plot_params['x_compat'] =
    True`` (:issue:`2205`)
  - Existing TimeSeries methods ``at_time`` and ``between_time`` were added to
    DataFrame (:issue:`2149`)
  - DataFrame.dot can now accept ndarrays (:issue:`2042`)
  - DataFrame.drop now supports non-unique indexes (:issue:`2101`)
  - Panel.shift now supports negative periods (:issue:`2164`)
  - DataFrame now support unary ~ operator (:issue:`2110`)

API changes
~~~~~~~~~~~

  - Upsampling data with a PeriodIndex will result in a higher frequency
    TimeSeries that spans the original time window

    .. code-block:: ipython

       In [1]: prng = pd.period_range('2012Q1', periods=2, freq='Q')

       In [2]: s = pd.Series(np.random.randn(len(prng)), prng)

       In [4]: s.resample('M')
       Out[4]:
       2012-01   -1.471992
       2012-02         NaN
       2012-03         NaN
       2012-04   -0.493593
       2012-05         NaN
       2012-06         NaN
       Freq: M, dtype: float64

  - Period.end_time now returns the last nanosecond in the time interval
    (:issue:`2124`, :issue:`2125`, :issue:`1764`)

    .. ipython:: python

        p = pd.Period('2012')

        p.end_time


  - File parsers no longer coerce to float or bool for columns that have custom
    converters specified (:issue:`2184`)

    .. ipython:: python

        import io

        data = ('A,B,C\n'
                '00001,001,5\n'
                '00002,002,6')
        pd.read_csv(io.StringIO(data), converters={'A': lambda x: x.strip()})


See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list.


.. _whatsnew_0.9.1.contributors:

Contributors
~~~~~~~~~~~~

.. contributors:: v0.9.0..v0.9.1