1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844
|
.. _whatsnew_0181:
Version 0.18.1 (May 3, 2016)
----------------------------
{{ header }}
This is a minor bug-fix release from 0.18.0 and includes a large number of
bug fixes along with several new features, enhancements, and performance improvements.
We recommend that all users upgrade to this version.
Highlights include:
- ``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see :ref:`here <whatsnew_0181.deferred_ops>`
- ``pd.to_datetime()`` has gained the ability to assemble dates from a ``DataFrame``, see :ref:`here <whatsnew_0181.enhancements.assembling>`
- Method chaining improvements, see :ref:`here <whatsnew_0181.enhancements.method_chain>`.
- Custom business hour offset, see :ref:`here <whatsnew_0181.enhancements.custombusinesshour>`.
- Many bug fixes in the handling of ``sparse``, see :ref:`here <whatsnew_0181.sparse>`
- Expanded the :ref:`Tutorials section <tutorial-modern>` with a feature on modern pandas, courtesy of `@TomAugsburger <https://twitter.com/TomAugspurger>`__. (:issue:`13045`).
.. contents:: What's new in v0.18.1
:local:
:backlinks: none
.. _whatsnew_0181.new_features:
New features
~~~~~~~~~~~~
.. _whatsnew_0181.enhancements.custombusinesshour:
Custom business hour
^^^^^^^^^^^^^^^^^^^^
The ``CustomBusinessHour`` is a mixture of ``BusinessHour`` and ``CustomBusinessDay`` which
allows you to specify arbitrary holidays. For details,
see :ref:`Custom Business Hour <timeseries.custombusinesshour>` (:issue:`11514`)
.. ipython:: python
from pandas.tseries.offsets import CustomBusinessHour
from pandas.tseries.holiday import USFederalHolidayCalendar
bhour_us = CustomBusinessHour(calendar=USFederalHolidayCalendar())
Friday before MLK Day
.. ipython:: python
import datetime
dt = datetime.datetime(2014, 1, 17, 15)
dt + bhour_us
Tuesday after MLK Day (Monday is skipped because it's a holiday)
.. ipython:: python
dt + bhour_us * 2
.. _whatsnew_0181.deferred_ops:
Method ``.groupby(..)`` syntax with window and resample operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see (:issue:`12486`, :issue:`12738`).
You can now use ``.rolling(..)`` and ``.expanding(..)`` as methods on groupbys. These return another deferred object (similar to what ``.rolling()`` and ``.expanding()`` do on ungrouped pandas objects). You can then operate on these ``RollingGroupby`` objects in a similar manner.
Previously you would have to do this to get a rolling window mean per-group:
.. ipython:: python
df = pd.DataFrame({"A": [1] * 20 + [2] * 12 + [3] * 8, "B": np.arange(40)})
df
.. code-block:: ipython
In [1]: df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
Out[1]:
A
1 0 NaN
1 NaN
2 NaN
3 1.5
4 2.5
5 3.5
6 4.5
7 5.5
8 6.5
9 7.5
10 8.5
11 9.5
12 10.5
13 11.5
14 12.5
15 13.5
16 14.5
17 15.5
18 16.5
19 17.5
2 20 NaN
21 NaN
22 NaN
23 21.5
24 22.5
25 23.5
26 24.5
27 25.5
28 26.5
29 27.5
30 28.5
31 29.5
3 32 NaN
33 NaN
34 NaN
35 33.5
36 34.5
37 35.5
38 36.5
39 37.5
Name: B, dtype: float64
Now you can do:
.. ipython:: python
df.groupby("A").rolling(4).B.mean()
For ``.resample(..)`` type of operations, previously you would have to:
.. ipython:: python
df = pd.DataFrame(
{
"date": pd.date_range(start="2016-01-01", periods=4, freq="W"),
"group": [1, 1, 2, 2],
"val": [5, 6, 7, 8],
}
).set_index("date")
df
.. code-block:: ipython
In[1]: df.groupby("group").apply(lambda x: x.resample("1D").ffill())
Out[1]:
group val
group date
1 2016-01-03 1 5
2016-01-04 1 5
2016-01-05 1 5
2016-01-06 1 5
2016-01-07 1 5
2016-01-08 1 5
2016-01-09 1 5
2016-01-10 1 6
2 2016-01-17 2 7
2016-01-18 2 7
2016-01-19 2 7
2016-01-20 2 7
2016-01-21 2 7
2016-01-22 2 7
2016-01-23 2 7
2016-01-24 2 8
Now you can do:
.. code-block:: ipython
In[1]: df.groupby("group").resample("1D").ffill()
Out[1]:
group val
group date
1 2016-01-03 1 5
2016-01-04 1 5
2016-01-05 1 5
2016-01-06 1 5
2016-01-07 1 5
2016-01-08 1 5
2016-01-09 1 5
2016-01-10 1 6
2 2016-01-17 2 7
2016-01-18 2 7
2016-01-19 2 7
2016-01-20 2 7
2016-01-21 2 7
2016-01-22 2 7
2016-01-23 2 7
2016-01-24 2 8
.. _whatsnew_0181.enhancements.method_chain:
Method chaining improvements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The following methods / indexers now accept a ``callable``. It is intended to make
these more useful in method chains, see the :ref:`documentation <indexing.callable>`.
(:issue:`11485`, :issue:`12533`)
- ``.where()`` and ``.mask()``
- ``.loc[]``, ``iloc[]`` and ``.ix[]``
- ``[]`` indexing
Methods ``.where()`` and ``.mask()``
""""""""""""""""""""""""""""""""""""
These can accept a callable for the condition and ``other``
arguments.
.. ipython:: python
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]})
df.where(lambda x: x > 4, lambda x: x + 10)
Methods ``.loc[]``, ``.iloc[]``, ``.ix[]``
""""""""""""""""""""""""""""""""""""""""""
These can accept a callable, and a tuple of callable as a slicer. The callable
can return a valid boolean indexer or anything which is valid for these indexer's input.
.. ipython:: python
# callable returns bool indexer
df.loc[lambda x: x.A >= 2, lambda x: x.sum() > 10]
# callable returns list of labels
df.loc[lambda x: [1, 2], lambda x: ["A", "B"]]
Indexing with ``[]``
""""""""""""""""""""
Finally, you can use a callable in ``[]`` indexing of Series, DataFrame and Panel.
The callable must return a valid input for ``[]`` indexing depending on its
class and index type.
.. ipython:: python
df[lambda x: "A"]
Using these methods / indexers, you can chain data selection operations
without using temporary variable.
.. ipython:: python
bb = pd.read_csv("data/baseball.csv", index_col="id")
(bb.groupby(["year", "team"]).sum(numeric_only=True).loc[lambda df: df.r > 100])
.. _whatsnew_0181.partial_string_indexing:
Partial string indexing on ``DatetimeIndex`` when part of a ``MultiIndex``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Partial string indexing now matches on ``DateTimeIndex`` when part of a ``MultiIndex`` (:issue:`10331`)
.. code-block:: ipython
In [20]: dft2 = pd.DataFrame(
....: np.random.randn(20, 1),
....: columns=["A"],
....: index=pd.MultiIndex.from_product(
....: [pd.date_range("20130101", periods=10, freq="12H"), ["a", "b"]]
....: ),
....: )
....:
In [21]: dft2
Out[21]:
A
2013-01-01 00:00:00 a 0.469112
b -0.282863
2013-01-01 12:00:00 a -1.509059
b -1.135632
2013-01-02 00:00:00 a 1.212112
... ...
2013-01-04 12:00:00 b 0.271860
2013-01-05 00:00:00 a -0.424972
b 0.567020
2013-01-05 12:00:00 a 0.276232
b -1.087401
[20 rows x 1 columns]
In [22]: dft2.loc["2013-01-05"]
Out[22]:
A
2013-01-05 00:00:00 a -0.424972
b 0.567020
2013-01-05 12:00:00 a 0.276232
b -1.087401
[4 rows x 1 columns]
On other levels
.. code-block:: ipython
In [26]: idx = pd.IndexSlice
In [27]: dft2 = dft2.swaplevel(0, 1).sort_index()
In [28]: dft2
Out[28]:
A
a 2013-01-01 00:00:00 0.469112
2013-01-01 12:00:00 -1.509059
2013-01-02 00:00:00 1.212112
2013-01-02 12:00:00 0.119209
2013-01-03 00:00:00 -0.861849
... ...
b 2013-01-03 12:00:00 1.071804
2013-01-04 00:00:00 -0.706771
2013-01-04 12:00:00 0.271860
2013-01-05 00:00:00 0.567020
2013-01-05 12:00:00 -1.087401
[20 rows x 1 columns]
In [29]: dft2.loc[idx[:, "2013-01-05"], :]
Out[29]:
A
a 2013-01-05 00:00:00 -0.424972
2013-01-05 12:00:00 0.276232
b 2013-01-05 00:00:00 0.567020
2013-01-05 12:00:00 -1.087401
[4 rows x 1 columns]
.. _whatsnew_0181.enhancements.assembling:
Assembling datetimes
^^^^^^^^^^^^^^^^^^^^
``pd.to_datetime()`` has gained the ability to assemble datetimes from a passed in ``DataFrame`` or a dict. (:issue:`8158`).
.. ipython:: python
df = pd.DataFrame(
{"year": [2015, 2016], "month": [2, 3], "day": [4, 5], "hour": [2, 3]}
)
df
Assembling using the passed frame.
.. ipython:: python
pd.to_datetime(df)
You can pass only the columns that you need to assemble.
.. ipython:: python
pd.to_datetime(df[["year", "month", "day"]])
.. _whatsnew_0181.other:
Other enhancements
^^^^^^^^^^^^^^^^^^
- ``pd.read_csv()`` now supports ``delim_whitespace=True`` for the Python engine (:issue:`12958`)
- ``pd.read_csv()`` now supports opening ZIP files that contains a single CSV, via extension inference or explicit ``compression='zip'`` (:issue:`12175`)
- ``pd.read_csv()`` now supports opening files using xz compression, via extension inference or explicit ``compression='xz'`` is specified; ``xz`` compressions is also supported by ``DataFrame.to_csv`` in the same way (:issue:`11852`)
- ``pd.read_msgpack()`` now always gives writeable ndarrays even when compression is used (:issue:`12359`).
- ``pd.read_msgpack()`` now supports serializing and de-serializing categoricals with msgpack (:issue:`12573`)
- ``.to_json()`` now supports ``NDFrames`` that contain categorical and sparse data (:issue:`10778`)
- ``interpolate()`` now supports ``method='akima'`` (:issue:`7588`).
- ``pd.read_excel()`` now accepts path objects (e.g. ``pathlib.Path``, ``py.path.local``) for the file path, in line with other ``read_*`` functions (:issue:`12655`)
- Added ``.weekday_name`` property as a component to ``DatetimeIndex`` and the ``.dt`` accessor. (:issue:`11128`)
- ``Index.take`` now handles ``allow_fill`` and ``fill_value`` consistently (:issue:`12631`)
.. ipython:: python
idx = pd.Index([1.0, 2.0, 3.0, 4.0], dtype="float")
# default, allow_fill=True, fill_value=None
idx.take([2, -1])
idx.take([2, -1], fill_value=True)
- ``Index`` now supports ``.str.get_dummies()`` which returns ``MultiIndex``, see :ref:`Creating Indicator Variables <text.indicator>` (:issue:`10008`, :issue:`10103`)
.. ipython:: python
idx = pd.Index(["a|b", "a|c", "b|c"])
idx.str.get_dummies("|")
- ``pd.crosstab()`` has gained a ``normalize`` argument for normalizing frequency tables (:issue:`12569`). Examples in the updated docs :ref:`here <reshaping.crosstabulations>`.
- ``.resample(..).interpolate()`` is now supported (:issue:`12925`)
- ``.isin()`` now accepts passed ``sets`` (:issue:`12988`)
.. _whatsnew_0181.sparse:
Sparse changes
~~~~~~~~~~~~~~
These changes conform sparse handling to return the correct types and work to make a smoother experience with indexing.
``SparseArray.take`` now returns a scalar for scalar input, ``SparseArray`` for others. Furthermore, it handles a negative indexer with the same rule as ``Index`` (:issue:`10560`, :issue:`12796`)
.. code-block:: python
s = pd.SparseArray([np.nan, np.nan, 1, 2, 3, np.nan, 4, 5, np.nan, 6])
s.take(0)
s.take([1, 2, 3])
- Bug in ``SparseSeries[]`` indexing with ``Ellipsis`` raises ``KeyError`` (:issue:`9467`)
- Bug in ``SparseArray[]`` indexing with tuples are not handled properly (:issue:`12966`)
- Bug in ``SparseSeries.loc[]`` with list-like input raises ``TypeError`` (:issue:`10560`)
- Bug in ``SparseSeries.iloc[]`` with scalar input may raise ``IndexError`` (:issue:`10560`)
- Bug in ``SparseSeries.loc[]``, ``.iloc[]`` with ``slice`` returns ``SparseArray``, rather than ``SparseSeries`` (:issue:`10560`)
- Bug in ``SparseDataFrame.loc[]``, ``.iloc[]`` may results in dense ``Series``, rather than ``SparseSeries`` (:issue:`12787`)
- Bug in ``SparseArray`` addition ignores ``fill_value`` of right hand side (:issue:`12910`)
- Bug in ``SparseArray`` mod raises ``AttributeError`` (:issue:`12910`)
- Bug in ``SparseArray`` pow calculates ``1 ** np.nan`` as ``np.nan`` which must be 1 (:issue:`12910`)
- Bug in ``SparseArray`` comparison output may incorrect result or raise ``ValueError`` (:issue:`12971`)
- Bug in ``SparseSeries.__repr__`` raises ``TypeError`` when it is longer than ``max_rows`` (:issue:`10560`)
- Bug in ``SparseSeries.shape`` ignores ``fill_value`` (:issue:`10452`)
- Bug in ``SparseSeries`` and ``SparseArray`` may have different ``dtype`` from its dense values (:issue:`12908`)
- Bug in ``SparseSeries.reindex`` incorrectly handle ``fill_value`` (:issue:`12797`)
- Bug in ``SparseArray.to_frame()`` results in ``DataFrame``, rather than ``SparseDataFrame`` (:issue:`9850`)
- Bug in ``SparseSeries.value_counts()`` does not count ``fill_value`` (:issue:`6749`)
- Bug in ``SparseArray.to_dense()`` does not preserve ``dtype`` (:issue:`10648`)
- Bug in ``SparseArray.to_dense()`` incorrectly handle ``fill_value`` (:issue:`12797`)
- Bug in ``pd.concat()`` of ``SparseSeries`` results in dense (:issue:`10536`)
- Bug in ``pd.concat()`` of ``SparseDataFrame`` incorrectly handle ``fill_value`` (:issue:`9765`)
- Bug in ``pd.concat()`` of ``SparseDataFrame`` may raise ``AttributeError`` (:issue:`12174`)
- Bug in ``SparseArray.shift()`` may raise ``NameError`` or ``TypeError`` (:issue:`12908`)
.. _whatsnew_0181.api:
API changes
~~~~~~~~~~~
.. _whatsnew_0181.api.groubynth:
Method ``.groupby(..).nth()`` changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The index in ``.groupby(..).nth()`` output is now more consistent when the ``as_index`` argument is passed (:issue:`11039`):
.. ipython:: python
df = pd.DataFrame({"A": ["a", "b", "a"], "B": [1, 2, 3]})
df
Previous behavior:
.. code-block:: ipython
In [3]: df.groupby('A', as_index=True)['B'].nth(0)
Out[3]:
0 1
1 2
Name: B, dtype: int64
In [4]: df.groupby('A', as_index=False)['B'].nth(0)
Out[4]:
0 1
1 2
Name: B, dtype: int64
New behavior:
.. ipython:: python
df.groupby("A", as_index=True)["B"].nth(0)
df.groupby("A", as_index=False)["B"].nth(0)
Furthermore, previously, a ``.groupby`` would always sort, regardless if ``sort=False`` was passed with ``.nth()``.
.. ipython:: python
np.random.seed(1234)
df = pd.DataFrame(np.random.randn(100, 2), columns=["a", "b"])
df["c"] = np.random.randint(0, 4, 100)
Previous behavior:
.. code-block:: ipython
In [4]: df.groupby('c', sort=True).nth(1)
Out[4]:
a b
c
0 -0.334077 0.002118
1 0.036142 -2.074978
2 -0.720589 0.887163
3 0.859588 -0.636524
In [5]: df.groupby('c', sort=False).nth(1)
Out[5]:
a b
c
0 -0.334077 0.002118
1 0.036142 -2.074978
2 -0.720589 0.887163
3 0.859588 -0.636524
New behavior:
.. ipython:: python
df.groupby("c", sort=True).nth(1)
df.groupby("c", sort=False).nth(1)
.. _whatsnew_0181.numpy_compatibility:
NumPy function compatibility
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Compatibility between pandas array-like methods (e.g. ``sum`` and ``take``) and their ``numpy``
counterparts has been greatly increased by augmenting the signatures of the ``pandas`` methods so
as to accept arguments that can be passed in from ``numpy``, even if they are not necessarily
used in the ``pandas`` implementation (:issue:`12644`, :issue:`12638`, :issue:`12687`)
- ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`)
- Bug in numpy compatibility of ``np.round()`` on a ``Series`` (:issue:`12600`)
An example of this signature augmentation is illustrated below:
.. code-block:: python
sp = pd.SparseDataFrame([1, 2, 3])
sp
Previous behaviour:
.. code-block:: ipython
In [2]: np.cumsum(sp, axis=0)
...
TypeError: cumsum() takes at most 2 arguments (4 given)
New behaviour:
.. code-block:: python
np.cumsum(sp, axis=0)
.. _whatsnew_0181.apply_resample:
Using ``.apply`` on GroupBy resampling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using ``apply`` on resampling groupby operations (using a ``pd.TimeGrouper``) now has the same output types as similar ``apply`` calls on other groupby operations. (:issue:`11742`).
.. ipython:: python
df = pd.DataFrame(
{"date": pd.to_datetime(["10/10/2000", "11/10/2000"]), "value": [10, 13]}
)
df
Previous behavior:
.. code-block:: ipython
In [1]: df.groupby(pd.TimeGrouper(key='date',
...: freq='M')).apply(lambda x: x.value.sum())
Out[1]:
...
TypeError: cannot concatenate a non-NDFrame object
# Output is a Series
In [2]: df.groupby(pd.TimeGrouper(key='date',
...: freq='M')).apply(lambda x: x[['value']].sum())
Out[2]:
date
2000-10-31 value 10
2000-11-30 value 13
dtype: int64
New behavior:
.. code-block:: ipython
# Output is a Series
In [55]: df.groupby(pd.TimeGrouper(key='date',
...: freq='M')).apply(lambda x: x.value.sum())
Out[55]:
date
2000-10-31 10
2000-11-30 13
Freq: M, dtype: int64
# Output is a DataFrame
In [56]: df.groupby(pd.TimeGrouper(key='date',
...: freq='M')).apply(lambda x: x[['value']].sum())
Out[56]:
value
date
2000-10-31 10
2000-11-30 13
.. _whatsnew_0181.read_csv_exceptions:
Changes in ``read_csv`` exceptions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order to standardize the ``read_csv`` API for both the ``c`` and ``python`` engines, both will now raise an
``EmptyDataError``, a subclass of ``ValueError``, in response to empty columns or header (:issue:`12493`, :issue:`12506`)
Previous behaviour:
.. code-block:: ipython
In [1]: import io
In [2]: df = pd.read_csv(io.StringIO(''), engine='c')
...
ValueError: No columns to parse from file
In [3]: df = pd.read_csv(io.StringIO(''), engine='python')
...
StopIteration
New behaviour:
.. code-block:: ipython
In [1]: df = pd.read_csv(io.StringIO(''), engine='c')
...
pandas.io.common.EmptyDataError: No columns to parse from file
In [2]: df = pd.read_csv(io.StringIO(''), engine='python')
...
pandas.io.common.EmptyDataError: No columns to parse from file
In addition to this error change, several others have been made as well:
- ``CParserError`` now sub-classes ``ValueError`` instead of just a ``Exception`` (:issue:`12551`)
- A ``CParserError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine cannot parse a column (:issue:`12506`)
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine encounters a ``NaN`` value in an integer column (:issue:`12506`)
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when ``true_values`` is specified, and the ``c`` engine encounters an element in a column containing unencodable bytes (:issue:`12506`)
- ``pandas.parser.OverflowError`` exception has been removed and has been replaced with Python's built-in ``OverflowError`` exception (:issue:`12506`)
- ``pd.read_csv()`` no longer allows a combination of strings and integers for the ``usecols`` parameter (:issue:`12678`)
.. _whatsnew_0181.api.to_datetime:
Method ``to_datetime`` error changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Bugs in ``pd.to_datetime()`` when passing a ``unit`` with convertible entries and ``errors='coerce'`` or non-convertible with ``errors='ignore'``. Furthermore, an ``OutOfBoundsDateime`` exception will be raised when an out-of-range value is encountered for that unit when ``errors='raise'``. (:issue:`11758`, :issue:`13052`, :issue:`13059`)
Previous behaviour:
.. code-block:: ipython
In [27]: pd.to_datetime(1420043460, unit='s', errors='coerce')
Out[27]: NaT
In [28]: pd.to_datetime(11111111, unit='D', errors='ignore')
OverflowError: Python int too large to convert to C long
In [29]: pd.to_datetime(11111111, unit='D', errors='raise')
OverflowError: Python int too large to convert to C long
New behaviour:
.. code-block:: ipython
In [2]: pd.to_datetime(1420043460, unit='s', errors='coerce')
Out[2]: Timestamp('2014-12-31 16:31:00')
In [3]: pd.to_datetime(11111111, unit='D', errors='ignore')
Out[3]: 11111111
In [4]: pd.to_datetime(11111111, unit='D', errors='raise')
OutOfBoundsDatetime: cannot convert input with unit 'D'
.. _whatsnew_0181.api.other:
Other API changes
^^^^^^^^^^^^^^^^^
- ``.swaplevel()`` for ``Series``, ``DataFrame``, ``Panel``, and ``MultiIndex`` now features defaults for its first two parameters ``i`` and ``j`` that swap the two innermost levels of the index. (:issue:`12934`)
- ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`)
- ``Period`` and ``PeriodIndex`` now raises ``IncompatibleFrequency`` error which inherits ``ValueError`` rather than raw ``ValueError`` (:issue:`12615`)
- ``Series.apply`` for category dtype now applies the passed function to each of the ``.categories`` (and not the ``.codes``), and returns a ``category`` dtype if possible (:issue:`12473`)
- ``read_csv`` will now raise a ``TypeError`` if ``parse_dates`` is neither a boolean, list, or dictionary (matches the doc-string) (:issue:`5636`)
- The default for ``.query()/.eval()`` is now ``engine=None``, which will use ``numexpr`` if it's installed; otherwise it will fallback to the ``python`` engine. This mimics the pre-0.18.1 behavior if ``numexpr`` is installed (and which, previously, if numexpr was not installed, ``.query()/.eval()`` would raise). (:issue:`12749`)
- ``pd.show_versions()`` now includes ``pandas_datareader`` version (:issue:`12740`)
- Provide a proper ``__name__`` and ``__qualname__`` attributes for generic functions (:issue:`12021`)
- ``pd.concat(ignore_index=True)`` now uses ``RangeIndex`` as default (:issue:`12695`)
- ``pd.merge()`` and ``DataFrame.join()`` will show a ``UserWarning`` when merging/joining a single- with a multi-leveled dataframe (:issue:`9455`, :issue:`12219`)
- Compat with ``scipy`` > 0.17 for deprecated ``piecewise_polynomial`` interpolation method; support for the replacement ``from_derivatives`` method (:issue:`12887`)
.. _whatsnew_0181.deprecations:
Deprecations
^^^^^^^^^^^^
- The method name ``Index.sym_diff()`` is deprecated and can be replaced by ``Index.symmetric_difference()`` (:issue:`12591`)
- The method name ``Categorical.sort()`` is deprecated in favor of ``Categorical.sort_values()`` (:issue:`12882`)
.. _whatsnew_0181.performance:
Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Improved speed of SAS reader (:issue:`12656`, :issue:`12961`)
- Performance improvements in ``.groupby(..).cumcount()`` (:issue:`11039`)
- Improved memory usage in ``pd.read_csv()`` when using ``skiprows=an_integer`` (:issue:`13005`)
- Improved performance of ``DataFrame.to_sql`` when checking case sensitivity for tables. Now only checks if table has been created correctly when table name is not lower case. (:issue:`12876`)
- Improved performance of ``Period`` construction and time series plotting (:issue:`12903`, :issue:`11831`).
- Improved performance of ``.str.encode()`` and ``.str.decode()`` methods (:issue:`13008`)
- Improved performance of ``to_numeric`` if input is numeric dtype (:issue:`12777`)
- Improved performance of sparse arithmetic with ``IntIndex`` (:issue:`13036`)
.. _whatsnew_0181.bug_fixes:
Bug fixes
~~~~~~~~~
- ``usecols`` parameter in ``pd.read_csv`` is now respected even when the lines of a CSV file are not even (:issue:`12203`)
- Bug in ``groupby.transform(..)`` when ``axis=1`` is specified with a non-monotonic ordered index (:issue:`12713`)
- Bug in ``Period`` and ``PeriodIndex`` creation raises ``KeyError`` if ``freq="Minute"`` is specified. Note that "Minute" freq is deprecated in v0.17.0, and recommended to use ``freq="T"`` instead (:issue:`11854`)
- Bug in ``.resample(...).count()`` with a ``PeriodIndex`` always raising a ``TypeError`` (:issue:`12774`)
- Bug in ``.resample(...)`` with a ``PeriodIndex`` casting to a ``DatetimeIndex`` when empty (:issue:`12868`)
- Bug in ``.resample(...)`` with a ``PeriodIndex`` when resampling to an existing frequency (:issue:`12770`)
- Bug in printing data which contains ``Period`` with different ``freq`` raises ``ValueError`` (:issue:`12615`)
- Bug in ``Series`` construction with ``Categorical`` and ``dtype='category'`` is specified (:issue:`12574`)
- Bugs in concatenation with a coercible dtype was too aggressive, resulting in different dtypes in output formatting when an object was longer than ``display.max_rows`` (:issue:`12411`, :issue:`12045`, :issue:`11594`, :issue:`10571`, :issue:`12211`)
- Bug in ``float_format`` option with option not being validated as a callable. (:issue:`12706`)
- Bug in ``GroupBy.filter`` when ``dropna=False`` and no groups fulfilled the criteria (:issue:`12768`)
- Bug in ``__name__`` of ``.cum*`` functions (:issue:`12021`)
- Bug in ``.astype()`` of a ``Float64Inde/Int64Index`` to an ``Int64Index`` (:issue:`12881`)
- Bug in round tripping an integer based index in ``.to_json()/.read_json()`` when ``orient='index'`` (the default) (:issue:`12866`)
- Bug in plotting ``Categorical`` dtypes cause error when attempting stacked bar plot (:issue:`13019`)
- Compat with >= ``numpy`` 1.11 for ``NaT`` comparisons (:issue:`12969`)
- Bug in ``.drop()`` with a non-unique ``MultiIndex``. (:issue:`12701`)
- Bug in ``.concat`` of datetime tz-aware and naive DataFrames (:issue:`12467`)
- Bug in correctly raising a ``ValueError`` in ``.resample(..).fillna(..)`` when passing a non-string (:issue:`12952`)
- Bug fixes in various encoding and header processing issues in ``pd.read_sas()`` (:issue:`12659`, :issue:`12654`, :issue:`12647`, :issue:`12809`)
- Bug in ``pd.crosstab()`` where would silently ignore ``aggfunc`` if ``values=None`` (:issue:`12569`).
- Potential segfault in ``DataFrame.to_json`` when serialising ``datetime.time`` (:issue:`11473`).
- Potential segfault in ``DataFrame.to_json`` when attempting to serialise 0d array (:issue:`11299`).
- Segfault in ``to_json`` when attempting to serialise a ``DataFrame`` or ``Series`` with non-ndarray values; now supports serialization of ``category``, ``sparse``, and ``datetime64[ns, tz]`` dtypes (:issue:`10778`).
- Bug in ``DataFrame.to_json`` with unsupported dtype not passed to default handler (:issue:`12554`).
- Bug in ``.align`` not returning the sub-class (:issue:`12983`)
- Bug in aligning a ``Series`` with a ``DataFrame`` (:issue:`13037`)
- Bug in ``ABCPanel`` in which ``Panel4D`` was not being considered as a valid instance of this generic type (:issue:`12810`)
- Bug in consistency of ``.name`` on ``.groupby(..).apply(..)`` cases (:issue:`12363`)
- Bug in ``Timestamp.__repr__`` that caused ``pprint`` to fail in nested structures (:issue:`12622`)
- Bug in ``Timedelta.min`` and ``Timedelta.max``, the properties now report the true minimum/maximum ``timedeltas`` as recognized by pandas. See the :ref:`documentation <timedeltas.limitations>`. (:issue:`12727`)
- Bug in ``.quantile()`` with interpolation may coerce to ``float`` unexpectedly (:issue:`12772`)
- Bug in ``.quantile()`` with empty ``Series`` may return scalar rather than empty ``Series`` (:issue:`12772`)
- Bug in ``.loc`` with out-of-bounds in a large indexer would raise ``IndexError`` rather than ``KeyError`` (:issue:`12527`)
- Bug in resampling when using a ``TimedeltaIndex`` and ``.asfreq()``, would previously not include the final fencepost (:issue:`12926`)
- Bug in equality testing with a ``Categorical`` in a ``DataFrame`` (:issue:`12564`)
- Bug in ``GroupBy.first()``, ``.last()`` returns incorrect row when ``TimeGrouper`` is used (:issue:`7453`)
- Bug in ``pd.read_csv()`` with the ``c`` engine when specifying ``skiprows`` with newlines in quoted items (:issue:`10911`, :issue:`12775`)
- Bug in ``DataFrame`` timezone lost when assigning tz-aware datetime ``Series`` with alignment (:issue:`12981`)
- Bug in ``.value_counts()`` when ``normalize=True`` and ``dropna=True`` where nulls still contributed to the normalized count (:issue:`12558`)
- Bug in ``Series.value_counts()`` loses name if its dtype is ``category`` (:issue:`12835`)
- Bug in ``Series.value_counts()`` loses timezone info (:issue:`12835`)
- Bug in ``Series.value_counts(normalize=True)`` with ``Categorical`` raises ``UnboundLocalError`` (:issue:`12835`)
- Bug in ``Panel.fillna()`` ignoring ``inplace=True`` (:issue:`12633`)
- Bug in ``pd.read_csv()`` when specifying ``names``, ``usecols``, and ``parse_dates`` simultaneously with the ``c`` engine (:issue:`9755`)
- Bug in ``pd.read_csv()`` when specifying ``delim_whitespace=True`` and ``lineterminator`` simultaneously with the ``c`` engine (:issue:`12912`)
- Bug in ``Series.rename``, ``DataFrame.rename`` and ``DataFrame.rename_axis`` not treating ``Series`` as mappings to relabel (:issue:`12623`).
- Clean in ``.rolling.min`` and ``.rolling.max`` to enhance dtype handling (:issue:`12373`)
- Bug in ``groupby`` where complex types are coerced to float (:issue:`12902`)
- Bug in ``Series.map`` raises ``TypeError`` if its dtype is ``category`` or tz-aware ``datetime`` (:issue:`12473`)
- Bugs on 32bit platforms for some test comparisons (:issue:`12972`)
- Bug in index coercion when falling back from ``RangeIndex`` construction (:issue:`12893`)
- Better error message in window functions when invalid argument (e.g. a float window) is passed (:issue:`12669`)
- Bug in slicing subclassed ``DataFrame`` defined to return subclassed ``Series`` may return normal ``Series`` (:issue:`11559`)
- Bug in ``.str`` accessor methods may raise ``ValueError`` if input has ``name`` and the result is ``DataFrame`` or ``MultiIndex`` (:issue:`12617`)
- Bug in ``DataFrame.last_valid_index()`` and ``DataFrame.first_valid_index()`` on empty frames (:issue:`12800`)
- Bug in ``CategoricalIndex.get_loc`` returns different result from regular ``Index`` (:issue:`12531`)
- Bug in ``PeriodIndex.resample`` where name not propagated (:issue:`12769`)
- Bug in ``date_range`` ``closed`` keyword and timezones (:issue:`12684`).
- Bug in ``pd.concat`` raises ``AttributeError`` when input data contains tz-aware datetime and timedelta (:issue:`12620`)
- Bug in ``pd.concat`` did not handle empty ``Series`` properly (:issue:`11082`)
- Bug in ``.plot.bar`` alignment when ``width`` is specified with ``int`` (:issue:`12979`)
- Bug in ``fill_value`` is ignored if the argument to a binary operator is a constant (:issue:`12723`)
- Bug in ``pd.read_html()`` when using bs4 flavor and parsing table with a header and only one column (:issue:`9178`)
- Bug in ``.pivot_table`` when ``margins=True`` and ``dropna=True`` where nulls still contributed to margin count (:issue:`12577`)
- Bug in ``.pivot_table`` when ``dropna=False`` where table index/column names disappear (:issue:`12133`)
- Bug in ``pd.crosstab()`` when ``margins=True`` and ``dropna=False`` which raised (:issue:`12642`)
- Bug in ``Series.name`` when ``name`` attribute can be a hashable type (:issue:`12610`)
- Bug in ``.describe()`` resets categorical columns information (:issue:`11558`)
- Bug where ``loffset`` argument was not applied when calling ``resample().count()`` on a timeseries (:issue:`12725`)
- ``pd.read_excel()`` now accepts column names associated with keyword argument ``names`` (:issue:`12870`)
- Bug in ``pd.to_numeric()`` with ``Index`` returns ``np.ndarray``, rather than ``Index`` (:issue:`12777`)
- Bug in ``pd.to_numeric()`` with datetime-like may raise ``TypeError`` (:issue:`12777`)
- Bug in ``pd.to_numeric()`` with scalar raises ``ValueError`` (:issue:`12777`)
.. _whatsnew_0.18.1.contributors:
Contributors
~~~~~~~~~~~~
.. contributors:: v0.18.0..v0.18.1
|