File: v0.7.3.rst

package info (click to toggle)
pandas 2.2.3%2Bdfsg-9
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 66,784 kB
  • sloc: python: 422,228; ansic: 9,190; sh: 270; xml: 102; makefile: 83
file content (151 lines) | stat: -rw-r--r-- 4,349 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
.. _whatsnew_0703:

Version 0.7.3 (April 12, 2012)
------------------------------

{{ header }}


This is a minor release from 0.7.2 and fixes many minor bugs and adds a number
of nice new features. There are also a couple of API changes to note; these
should not affect very many users, and we are inclined to call them "bug fixes"
even though they do constitute a change in behavior. See the :ref:`full release
notes <release>` or issue
tracker on GitHub for a complete list.

New features
~~~~~~~~~~~~

- New :ref:`fixed width file reader <io.fwf>`, ``read_fwf``
- New :ref:`scatter_matrix <visualization.scatter_matrix>` function for making
  a scatter plot matrix

.. code-block:: python

   from pandas.tools.plotting import scatter_matrix

   scatter_matrix(df, alpha=0.2)  # noqa F821


- Add ``stacked`` argument to Series and DataFrame's ``plot`` method for
  :ref:`stacked bar plots <visualization.barplot>`.

.. code-block:: python

   df.plot(kind="bar", stacked=True)  # noqa F821


.. code-block:: python

   df.plot(kind="barh", stacked=True)  # noqa F821


- Add log x and y :ref:`scaling options <visualization.basic>` to
  ``DataFrame.plot`` and ``Series.plot``
- Add ``kurt`` methods to Series and DataFrame for computing kurtosis


NA boolean comparison API change
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reverted some changes to how NA values (represented typically as ``NaN`` or
``None``) are handled in non-numeric Series:

.. code-block:: ipython

   In [1]: series = pd.Series(["Steve", np.nan, "Joe"])

   In [2]: series == "Steve"
   Out[2]:
   0     True
   1    False
   2    False
   Length: 3, dtype: bool

   In [3]: series != "Steve"
   Out[3]:
   0    False
   1     True
   2     True
   Length: 3, dtype: bool

In comparisons, NA / NaN will always come through as ``False`` except with
``!=`` which is ``True``. *Be very careful* with boolean arithmetic, especially
negation, in the presence of NA data. You may wish to add an explicit NA
filter into boolean array operations if you are worried about this:

.. code-block:: ipython

   In [4]: mask = series == "Steve"

   In [5]: series[mask & series.notnull()]
   Out[5]:
   0    Steve
   Length: 1, dtype: object

While propagating NA in comparisons may seem like the right behavior to some
users (and you could argue on purely technical grounds that this is the right
thing to do), the evaluation was made that propagating NA everywhere, including
in numerical arrays, would cause a large amount of problems for users. Thus, a
"practicality beats purity" approach was taken. This issue may be revisited at
some point in the future.

Other API changes
~~~~~~~~~~~~~~~~~

When calling ``apply`` on a grouped Series, the return value will also be a
Series, to be more consistent with the ``groupby`` behavior with DataFrame:

.. code-block:: ipython

      In [6]: df = pd.DataFrame(
         ...:     {
         ...:         "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
         ...:         "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
         ...:         "C": np.random.randn(8),
         ...:         "D": np.random.randn(8),
         ...:     }
         ...: )
         ...:

      In [7]: df
      Out[7]:
         A      B         C         D
      0  foo    one  0.469112 -0.861849
      1  bar    one -0.282863 -2.104569
      2  foo    two -1.509059 -0.494929
      3  bar  three -1.135632  1.071804
      4  foo    two  1.212112  0.721555
      5  bar    two -0.173215 -0.706771
      6  foo    one  0.119209 -1.039575
      7  foo  three -1.044236  0.271860

      [8 rows x 4 columns]

      In [8]: grouped = df.groupby("A")["C"]

      In [9]: grouped.describe()
      Out[9]:
         count      mean       std       min       25%       50%       75%       max
      A
      bar    3.0 -0.530570  0.526860 -1.135632 -0.709248 -0.282863 -0.228039 -0.173215
      foo    5.0 -0.150572  1.113308 -1.509059 -1.044236  0.119209  0.469112  1.212112

      [2 rows x 8 columns]

      In [10]: grouped.apply(lambda x: x.sort_values()[-2:])  # top 2 values
      Out[10]:
      A
      bar  1   -0.282863
           5   -0.173215
      foo  0    0.469112
           4    1.212112
      Name: C, Length: 4, dtype: float64


.. _whatsnew_0.7.3.contributors:

Contributors
~~~~~~~~~~~~

.. contributors:: v0.7.2..v0.7.3