File: v0.9.1.rst

package info (click to toggle)
pandas 2.2.3%2Bdfsg-9
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 66,784 kB
  • sloc: python: 422,228; ansic: 9,190; sh: 270; xml: 102; makefile: 83
file content (171 lines) | stat: -rw-r--r-- 5,040 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
.. _whatsnew_0901:

Version 0.9.1 (November 14, 2012)
---------------------------------

{{ header }}


This is a bug fix release from 0.9.0 and includes several new features and
enhancements along with a large number of bug fixes. The new features include
by-column sort order for DataFrame and Series, improved NA handling for the rank
method, masking functions for DataFrame, and intraday time-series filtering for
DataFrame.

New features
~~~~~~~~~~~~

  - ``Series.sort``, ``DataFrame.sort``, and ``DataFrame.sort_index`` can now be
    specified in a per-column manner to support multiple sort orders (:issue:`928`)

    .. code-block:: ipython

       In [2]: df = pd.DataFrame(np.random.randint(0, 2, (6, 3)),
          ...:                   columns=['A', 'B', 'C'])

       In [3]: df.sort(['A', 'B'], ascending=[1, 0])

       Out[3]:
          A  B  C
       3  0  1  1
       4  0  1  1
       2  0  0  1
       0  1  0  0
       1  1  0  0
       5  1  0  0

  - ``DataFrame.rank`` now supports additional argument values for the
    ``na_option`` parameter so missing values can be assigned either the largest
    or the smallest rank (:issue:`1508`, :issue:`2159`)

    .. ipython:: python

        df = pd.DataFrame(np.random.randn(6, 3), columns=['A', 'B', 'C'])

        df.loc[2:4] = np.nan

        df.rank()

        df.rank(na_option='top')

        df.rank(na_option='bottom')


  - DataFrame has new ``where`` and ``mask`` methods to select values according to a
    given boolean mask (:issue:`2109`, :issue:`2151`)

        DataFrame currently supports slicing via a boolean vector the same length as the DataFrame (inside the ``[]``).
        The returned DataFrame has the same number of columns as the original, but is sliced on its index.

        .. ipython:: python

            df = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'])

            df

            df[df['A'] > 0]

        If a DataFrame is sliced with a DataFrame based boolean condition (with the same size as the original DataFrame),
        then a DataFrame the same size (index and columns) as the original is returned, with
        elements that do not meet the boolean condition as ``NaN``. This is accomplished via
        the new method ``DataFrame.where``. In addition, ``where`` takes an optional ``other`` argument for replacement.

        .. ipython:: python

           df[df > 0]

           df.where(df > 0)

           df.where(df > 0, -df)

        Furthermore, ``where`` now aligns the input boolean condition (ndarray or DataFrame), such that partial selection
        with setting is possible. This is analogous to partial setting via ``.ix`` (but on the contents rather than the axis labels)

        .. ipython:: python

           df2 = df.copy()
           df2[df2[1:4] > 0] = 3
           df2

        ``DataFrame.mask`` is the inverse boolean operation of ``where``.

        .. ipython:: python

           df.mask(df <= 0)

  - Enable referencing of Excel columns by their column names (:issue:`1936`)

    .. code-block:: ipython

       In [1]: xl = pd.ExcelFile('data/test.xls')

       In [2]: xl.parse('Sheet1', index_col=0, parse_dates=True,
                        parse_cols='A:D')


  - Added option to disable pandas-style tick locators and formatters
    using ``series.plot(x_compat=True)`` or ``pandas.plot_params['x_compat'] =
    True`` (:issue:`2205`)
  - Existing TimeSeries methods ``at_time`` and ``between_time`` were added to
    DataFrame (:issue:`2149`)
  - DataFrame.dot can now accept ndarrays (:issue:`2042`)
  - DataFrame.drop now supports non-unique indexes (:issue:`2101`)
  - Panel.shift now supports negative periods (:issue:`2164`)
  - DataFrame now support unary ~ operator (:issue:`2110`)

API changes
~~~~~~~~~~~

  - Upsampling data with a PeriodIndex will result in a higher frequency
    TimeSeries that spans the original time window

    .. code-block:: ipython

       In [1]: prng = pd.period_range('2012Q1', periods=2, freq='Q')

       In [2]: s = pd.Series(np.random.randn(len(prng)), prng)

       In [4]: s.resample('M')
       Out[4]:
       2012-01   -1.471992
       2012-02         NaN
       2012-03         NaN
       2012-04   -0.493593
       2012-05         NaN
       2012-06         NaN
       Freq: M, dtype: float64

  - Period.end_time now returns the last nanosecond in the time interval
    (:issue:`2124`, :issue:`2125`, :issue:`1764`)

    .. ipython:: python

        p = pd.Period('2012')

        p.end_time


  - File parsers no longer coerce to float or bool for columns that have custom
    converters specified (:issue:`2184`)

    .. ipython:: python

        import io

        data = ('A,B,C\n'
                '00001,001,5\n'
                '00002,002,6')
        pd.read_csv(io.StringIO(data), converters={'A': lambda x: x.strip()})


See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list.


.. _whatsnew_0.9.1.contributors:

Contributors
~~~~~~~~~~~~

.. contributors:: v0.9.0..v0.9.1