1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384
|
.. _whatsnew_0700:
Version 0.7.0 (February 9, 2012)
--------------------------------
{{ header }}
New features
~~~~~~~~~~~~
- New unified :ref:`merge function <merging.join>` for efficiently performing
full gamut of database / relational-algebra operations. Refactored existing
join methods to use the new infrastructure, resulting in substantial
performance gains (:issue:`220`, :issue:`249`, :issue:`267`)
- New :ref:`unified concatenation function <merging.concat>` for concatenating
Series, DataFrame or Panel objects along an axis. Can form union or
intersection of the other axes. Improves performance of ``Series.append`` and
``DataFrame.append`` (:issue:`468`, :issue:`479`, :issue:`273`)
- Can pass multiple DataFrames to
``DataFrame.append`` to concatenate (stack) and multiple Series to
``Series.append`` too
- :ref:`Can<basics.dataframe.from_list_of_dicts>` pass list of dicts (e.g., a
list of JSON objects) to DataFrame constructor (:issue:`526`)
- You can now :ref:`set multiple columns <indexing.columns.multiple>` in a
DataFrame via ``__getitem__``, useful for transformation (:issue:`342`)
- Handle differently-indexed output values in ``DataFrame.apply`` (:issue:`498`)
.. code-block:: ipython
In [1]: df = pd.DataFrame(np.random.randn(10, 4))
In [2]: df.apply(lambda x: x.describe())
Out[2]:
0 1 2 3
count 10.000000 10.000000 10.000000 10.000000
mean 0.190912 -0.395125 -0.731920 -0.403130
std 0.730951 0.813266 1.112016 0.961912
min -0.861849 -2.104569 -1.776904 -1.469388
25% -0.411391 -0.698728 -1.501401 -1.076610
50% 0.380863 -0.228039 -1.191943 -1.004091
75% 0.658444 0.057974 -0.034326 0.461706
max 1.212112 0.577046 1.643563 1.071804
[8 rows x 4 columns]
- :ref:`Add<advanced.reorderlevels>` ``reorder_levels`` method to Series and
DataFrame (:issue:`534`)
- :ref:`Add<indexing.dictionarylike>` dict-like ``get`` function to DataFrame
and Panel (:issue:`521`)
- :ref:`Add<basics.iterrows>` ``DataFrame.iterrows`` method for efficiently
iterating through the rows of a DataFrame
- Add ``DataFrame.to_panel`` with code adapted from
``LongPanel.to_long``
- :ref:`Add <basics.reindexing>` ``reindex_axis`` method added to DataFrame
- :ref:`Add <basics.stats>` ``level`` option to binary arithmetic functions on
``DataFrame`` and ``Series``
- :ref:`Add <advanced.advanced_reindex>` ``level`` option to the ``reindex``
and ``align`` methods on Series and DataFrame for broadcasting values across
a level (:issue:`542`, :issue:`552`, others)
- Add attribute-based item access to
``Panel`` and add IPython completion (:issue:`563`)
- :ref:`Add <visualization.basic>` ``logy`` option to ``Series.plot`` for
log-scaling on the Y axis
- :ref:`Add <io.formatting>` ``index`` and ``header`` options to
``DataFrame.to_string``
- :ref:`Can <merging.multiple_join>` pass multiple DataFrames to
``DataFrame.join`` to join on index (:issue:`115`)
- :ref:`Can <merging.multiple_join>` pass multiple Panels to ``Panel.join``
(:issue:`115`)
- :ref:`Added <io.formatting>` ``justify`` argument to ``DataFrame.to_string``
to allow different alignment of column headers
- :ref:`Add <groupby.attributes>` ``sort`` option to GroupBy to allow disabling
sorting of the group keys for potential speedups (:issue:`595`)
- :ref:`Can <basics.dataframe.from_series>` pass MaskedArray to Series
constructor (:issue:`563`)
- Add Panel item access via attributes
and IPython completion (:issue:`554`)
- Implement ``DataFrame.lookup``, fancy-indexing analogue for retrieving values
given a sequence of row and column labels (:issue:`338`)
- Can pass a :ref:`list of functions <groupby.aggregate.multifunc>` to
aggregate with groupby on a DataFrame, yielding an aggregated result with
hierarchical columns (:issue:`166`)
- Can call ``cummin`` and ``cummax`` on Series and DataFrame to get cumulative
minimum and maximum, respectively (:issue:`647`)
- ``value_range`` added as utility function to get min and max of a dataframe
(:issue:`288`)
- Added ``encoding`` argument to ``read_csv``, ``read_table``, ``to_csv`` and
``from_csv`` for non-ascii text (:issue:`717`)
- :ref:`Added <basics.stats>` ``abs`` method to pandas objects
- :ref:`Added <reshaping.pivot>` ``crosstab`` function for easily computing frequency tables
- :ref:`Added <indexing.set_ops>` ``isin`` method to index objects
- :ref:`Added <advanced.xs>` ``level`` argument to ``xs`` method of DataFrame.
API changes to integer indexing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One of the potentially riskiest API changes in 0.7.0, but also one of the most
important, was a complete review of how **integer indexes** are handled with
regard to label-based indexing. Here is an example:
.. code-block:: ipython
In [3]: s = pd.Series(np.random.randn(10), index=range(0, 20, 2))
In [4]: s
Out[4]:
0 -1.294524
2 0.413738
4 0.276662
6 -0.472035
8 -0.013960
10 -0.362543
12 -0.006154
14 -0.923061
16 0.895717
18 0.805244
Length: 10, dtype: float64
In [5]: s[0]
Out[5]: -1.2945235902555294
In [6]: s[2]
Out[6]: 0.41373810535784006
In [7]: s[4]
Out[7]: 0.2766617129497566
This is all exactly identical to the behavior before. However, if you ask for a
key **not** contained in the Series, in versions 0.6.1 and prior, Series would
*fall back* on a location-based lookup. This now raises a ``KeyError``:
.. code-block:: ipython
In [2]: s[1]
KeyError: 1
This change also has the same impact on DataFrame:
.. code-block:: ipython
In [3]: df = pd.DataFrame(np.random.randn(8, 4), index=range(0, 16, 2))
In [4]: df
0 1 2 3
0 0.88427 0.3363 -0.1787 0.03162
2 0.14451 -0.1415 0.2504 0.58374
4 -1.44779 -0.9186 -1.4996 0.27163
6 -0.26598 -2.4184 -0.2658 0.11503
8 -0.58776 0.3144 -0.8566 0.61941
10 0.10940 -0.7175 -1.0108 0.47990
12 -1.16919 -0.3087 -0.6049 -0.43544
14 -0.07337 0.3410 0.0424 -0.16037
In [5]: df.ix[3]
KeyError: 3
In order to support purely integer-based indexing, the following methods have
been added:
.. csv-table::
:header: "Method","Description"
:widths: 40,60
``Series.iget_value(i)``, Retrieve value stored at location ``i``
``Series.iget(i)``, Alias for ``iget_value``
``DataFrame.irow(i)``, Retrieve the ``i``-th row
``DataFrame.icol(j)``, Retrieve the ``j``-th column
"``DataFrame.iget_value(i, j)``", Retrieve the value at row ``i`` and column ``j``
API tweaks regarding label-based slicing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Label-based slicing using ``ix`` now requires that the index be sorted
(monotonic) **unless** both the start and endpoint are contained in the index:
.. code-block:: python
In [1]: s = pd.Series(np.random.randn(6), index=list('gmkaec'))
In [2]: s
Out[2]:
g -1.182230
m -0.276183
k -0.243550
a 1.628992
e 0.073308
c -0.539890
dtype: float64
Then this is OK:
.. code-block:: python
In [3]: s.ix['k':'e']
Out[3]:
k -0.243550
a 1.628992
e 0.073308
dtype: float64
But this is not:
.. code-block:: ipython
In [12]: s.ix['b':'h']
KeyError 'b'
If the index had been sorted, the "range selection" would have been possible:
.. code-block:: python
In [4]: s2 = s.sort_index()
In [5]: s2
Out[5]:
a 1.628992
c -0.539890
e 0.073308
g -1.182230
k -0.243550
m -0.276183
dtype: float64
In [6]: s2.ix['b':'h']
Out[6]:
c -0.539890
e 0.073308
g -1.182230
dtype: float64
Changes to Series ``[]`` operator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As as notational convenience, you can pass a sequence of labels or a label
slice to a Series when getting and setting values via ``[]`` (i.e. the
``__getitem__`` and ``__setitem__`` methods). The behavior will be the same as
passing similar input to ``ix`` **except in the case of integer indexing**:
.. code-block:: ipython
In [8]: s = pd.Series(np.random.randn(6), index=list('acegkm'))
In [9]: s
Out[9]:
a -1.206412
c 2.565646
e 1.431256
g 1.340309
k -1.170299
m -0.226169
Length: 6, dtype: float64
In [10]: s[['m', 'a', 'c', 'e']]
Out[10]:
m -0.226169
a -1.206412
c 2.565646
e 1.431256
Length: 4, dtype: float64
In [11]: s['b':'l']
Out[11]:
c 2.565646
e 1.431256
g 1.340309
k -1.170299
Length: 4, dtype: float64
In [12]: s['c':'k']
Out[12]:
c 2.565646
e 1.431256
g 1.340309
k -1.170299
Length: 4, dtype: float64
In the case of integer indexes, the behavior will be exactly as before
(shadowing ``ndarray``):
.. code-block:: ipython
In [13]: s = pd.Series(np.random.randn(6), index=range(0, 12, 2))
In [14]: s[[4, 0, 2]]
Out[14]:
4 0.132003
0 0.410835
2 0.813850
Length: 3, dtype: float64
In [15]: s[1:5]
Out[15]:
2 0.813850
4 0.132003
6 -0.827317
8 -0.076467
Length: 4, dtype: float64
If you wish to do indexing with sequences and slicing on an integer index with
label semantics, use ``ix``.
Other API changes
~~~~~~~~~~~~~~~~~
- The deprecated ``LongPanel`` class has been completely removed
- If ``Series.sort`` is called on a column of a DataFrame, an exception will
now be raised. Before it was possible to accidentally mutate a DataFrame's
column by doing ``df[col].sort()`` instead of the side-effect free method
``df[col].order()`` (:issue:`316`)
- Miscellaneous renames and deprecations which will (harmlessly) raise
``FutureWarning``
- ``drop`` added as an optional parameter to ``DataFrame.reset_index`` (:issue:`699`)
Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- :ref:`Cythonized GroupBy aggregations <groupby.aggregate.builtin>` no longer
presort the data, thus achieving a significant speedup (:issue:`93`). GroupBy
aggregations with Python functions significantly sped up by clever
manipulation of the ndarray data type in Cython (:issue:`496`).
- Better error message in DataFrame constructor when passed column labels
don't match data (:issue:`497`)
- Substantially improve performance of multi-GroupBy aggregation when a
Python function is passed, reuse ndarray object in Cython (:issue:`496`)
- Can store objects indexed by tuples and floats in HDFStore (:issue:`492`)
- Don't print length by default in Series.to_string, add ``length`` option (:issue:`489`)
- Improve Cython code for multi-groupby to aggregate without having to sort
the data (:issue:`93`)
- Improve MultiIndex reindexing speed by storing tuples in the MultiIndex,
test for backwards unpickling compatibility
- Improve column reindexing performance by using specialized Cython take
function
- Further performance tweaking of Series.__getitem__ for standard use cases
- Avoid Index dict creation in some cases (i.e. when getting slices, etc.),
regression from prior versions
- Friendlier error message in setup.py if NumPy not installed
- Use common set of NA-handling operations (sum, mean, etc.) in Panel class
also (:issue:`536`)
- Default name assignment when calling ``reset_index`` on DataFrame with a
regular (non-hierarchical) index (:issue:`476`)
- Use Cythonized groupers when possible in Series/DataFrame stat ops with
``level`` parameter passed (:issue:`545`)
- Ported skiplist data structure to C to speed up ``rolling_median`` by about
5-10x in most typical use cases (:issue:`374`)
.. _whatsnew_0.7.0.contributors:
Contributors
~~~~~~~~~~~~
.. contributors:: v0.6.1..v0.7.0
|