1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171
|
.. _whatsnew_0901:
Version 0.9.1 (November 14, 2012)
---------------------------------
{{ header }}
This is a bug fix release from 0.9.0 and includes several new features and
enhancements along with a large number of bug fixes. The new features include
by-column sort order for DataFrame and Series, improved NA handling for the rank
method, masking functions for DataFrame, and intraday time-series filtering for
DataFrame.
New features
~~~~~~~~~~~~
- ``Series.sort``, ``DataFrame.sort``, and ``DataFrame.sort_index`` can now be
specified in a per-column manner to support multiple sort orders (:issue:`928`)
.. code-block:: ipython
In [2]: df = pd.DataFrame(np.random.randint(0, 2, (6, 3)),
...: columns=['A', 'B', 'C'])
In [3]: df.sort(['A', 'B'], ascending=[1, 0])
Out[3]:
A B C
3 0 1 1
4 0 1 1
2 0 0 1
0 1 0 0
1 1 0 0
5 1 0 0
- ``DataFrame.rank`` now supports additional argument values for the
``na_option`` parameter so missing values can be assigned either the largest
or the smallest rank (:issue:`1508`, :issue:`2159`)
.. ipython:: python
df = pd.DataFrame(np.random.randn(6, 3), columns=['A', 'B', 'C'])
df.loc[2:4] = np.nan
df.rank()
df.rank(na_option='top')
df.rank(na_option='bottom')
- DataFrame has new ``where`` and ``mask`` methods to select values according to a
given boolean mask (:issue:`2109`, :issue:`2151`)
DataFrame currently supports slicing via a boolean vector the same length as the DataFrame (inside the ``[]``).
The returned DataFrame has the same number of columns as the original, but is sliced on its index.
.. ipython:: python
df = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'])
df
df[df['A'] > 0]
If a DataFrame is sliced with a DataFrame based boolean condition (with the same size as the original DataFrame),
then a DataFrame the same size (index and columns) as the original is returned, with
elements that do not meet the boolean condition as ``NaN``. This is accomplished via
the new method ``DataFrame.where``. In addition, ``where`` takes an optional ``other`` argument for replacement.
.. ipython:: python
df[df > 0]
df.where(df > 0)
df.where(df > 0, -df)
Furthermore, ``where`` now aligns the input boolean condition (ndarray or DataFrame), such that partial selection
with setting is possible. This is analogous to partial setting via ``.ix`` (but on the contents rather than the axis labels)
.. ipython:: python
df2 = df.copy()
df2[df2[1:4] > 0] = 3
df2
``DataFrame.mask`` is the inverse boolean operation of ``where``.
.. ipython:: python
df.mask(df <= 0)
- Enable referencing of Excel columns by their column names (:issue:`1936`)
.. code-block:: ipython
In [1]: xl = pd.ExcelFile('data/test.xls')
In [2]: xl.parse('Sheet1', index_col=0, parse_dates=True,
parse_cols='A:D')
- Added option to disable pandas-style tick locators and formatters
using ``series.plot(x_compat=True)`` or ``pandas.plot_params['x_compat'] =
True`` (:issue:`2205`)
- Existing TimeSeries methods ``at_time`` and ``between_time`` were added to
DataFrame (:issue:`2149`)
- DataFrame.dot can now accept ndarrays (:issue:`2042`)
- DataFrame.drop now supports non-unique indexes (:issue:`2101`)
- Panel.shift now supports negative periods (:issue:`2164`)
- DataFrame now support unary ~ operator (:issue:`2110`)
API changes
~~~~~~~~~~~
- Upsampling data with a PeriodIndex will result in a higher frequency
TimeSeries that spans the original time window
.. code-block:: ipython
In [1]: prng = pd.period_range('2012Q1', periods=2, freq='Q')
In [2]: s = pd.Series(np.random.randn(len(prng)), prng)
In [4]: s.resample('M')
Out[4]:
2012-01 -1.471992
2012-02 NaN
2012-03 NaN
2012-04 -0.493593
2012-05 NaN
2012-06 NaN
Freq: M, dtype: float64
- Period.end_time now returns the last nanosecond in the time interval
(:issue:`2124`, :issue:`2125`, :issue:`1764`)
.. ipython:: python
p = pd.Period('2012')
p.end_time
- File parsers no longer coerce to float or bool for columns that have custom
converters specified (:issue:`2184`)
.. ipython:: python
import io
data = ('A,B,C\n'
'00001,001,5\n'
'00002,002,6')
pd.read_csv(io.StringIO(data), converters={'A': lambda x: x.strip()})
See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list.
.. _whatsnew_0.9.1.contributors:
Contributors
~~~~~~~~~~~~
.. contributors:: v0.9.0..v0.9.1
|