File: v0.18.1.rst

package info (click to toggle)
pandas 2.2.3%2Bdfsg-9
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 66,784 kB
  • sloc: python: 422,228; ansic: 9,190; sh: 270; xml: 102; makefile: 83
file content (844 lines) | stat: -rw-r--r-- 33,199 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
.. _whatsnew_0181:

Version 0.18.1 (May 3, 2016)
----------------------------

{{ header }}


This is a minor bug-fix release from 0.18.0 and includes a large number of
bug fixes along with several new features, enhancements, and performance improvements.
We recommend that all users upgrade to this version.

Highlights include:

- ``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see :ref:`here <whatsnew_0181.deferred_ops>`
- ``pd.to_datetime()`` has gained the ability to assemble dates from a ``DataFrame``, see :ref:`here <whatsnew_0181.enhancements.assembling>`
- Method chaining improvements, see :ref:`here <whatsnew_0181.enhancements.method_chain>`.
- Custom business hour offset, see :ref:`here <whatsnew_0181.enhancements.custombusinesshour>`.
- Many bug fixes in the handling of ``sparse``, see :ref:`here <whatsnew_0181.sparse>`
- Expanded the :ref:`Tutorials section <tutorial-modern>` with a feature on modern pandas, courtesy of `@TomAugsburger <https://twitter.com/TomAugspurger>`__. (:issue:`13045`).


.. contents:: What's new in v0.18.1
    :local:
    :backlinks: none

.. _whatsnew_0181.new_features:

New features
~~~~~~~~~~~~

.. _whatsnew_0181.enhancements.custombusinesshour:

Custom business hour
^^^^^^^^^^^^^^^^^^^^

The ``CustomBusinessHour`` is a mixture of ``BusinessHour`` and ``CustomBusinessDay`` which
allows you to specify arbitrary holidays. For details,
see :ref:`Custom Business Hour <timeseries.custombusinesshour>` (:issue:`11514`)

.. ipython:: python

    from pandas.tseries.offsets import CustomBusinessHour
    from pandas.tseries.holiday import USFederalHolidayCalendar

    bhour_us = CustomBusinessHour(calendar=USFederalHolidayCalendar())

Friday before MLK Day

.. ipython:: python

    import datetime

    dt = datetime.datetime(2014, 1, 17, 15)

    dt + bhour_us

Tuesday after MLK Day (Monday is skipped because it's a holiday)

.. ipython:: python

    dt + bhour_us * 2

.. _whatsnew_0181.deferred_ops:

Method ``.groupby(..)`` syntax with window and resample operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see (:issue:`12486`, :issue:`12738`).

You can now use ``.rolling(..)`` and ``.expanding(..)`` as methods on groupbys. These return another deferred object (similar to what ``.rolling()`` and ``.expanding()`` do on ungrouped pandas objects). You can then operate on these ``RollingGroupby`` objects in a similar manner.

Previously you would have to do this to get a rolling window mean per-group:

.. ipython:: python

   df = pd.DataFrame({"A": [1] * 20 + [2] * 12 + [3] * 8, "B": np.arange(40)})
   df

.. code-block:: ipython

   In [1]: df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
   Out[1]:
   A
   1  0      NaN
      1      NaN
      2      NaN
      3      1.5
      4      2.5
      5      3.5
      6      4.5
      7      5.5
      8      6.5
      9      7.5
      10     8.5
      11     9.5
      12    10.5
      13    11.5
      14    12.5
      15    13.5
      16    14.5
      17    15.5
      18    16.5
      19    17.5
   2  20     NaN
      21     NaN
      22     NaN
      23    21.5
      24    22.5
      25    23.5
      26    24.5
      27    25.5
      28    26.5
      29    27.5
      30    28.5
      31    29.5
   3  32     NaN
      33     NaN
      34     NaN
      35    33.5
      36    34.5
      37    35.5
      38    36.5
      39    37.5
   Name: B, dtype: float64

Now you can do:

.. ipython:: python

   df.groupby("A").rolling(4).B.mean()

For ``.resample(..)`` type of operations, previously you would have to:

.. ipython:: python

   df = pd.DataFrame(
       {
           "date": pd.date_range(start="2016-01-01", periods=4, freq="W"),
           "group": [1, 1, 2, 2],
           "val": [5, 6, 7, 8],
       }
   ).set_index("date")

   df

.. code-block:: ipython

   In[1]: df.groupby("group").apply(lambda x: x.resample("1D").ffill())
   Out[1]:
                     group  val
   group date
   1     2016-01-03      1    5
         2016-01-04      1    5
         2016-01-05      1    5
         2016-01-06      1    5
         2016-01-07      1    5
         2016-01-08      1    5
         2016-01-09      1    5
         2016-01-10      1    6
   2     2016-01-17      2    7
         2016-01-18      2    7
         2016-01-19      2    7
         2016-01-20      2    7
         2016-01-21      2    7
         2016-01-22      2    7
         2016-01-23      2    7
         2016-01-24      2    8

Now you can do:

.. code-block:: ipython

   In[1]: df.groupby("group").resample("1D").ffill()
   Out[1]:
                     group  val
   group date
   1     2016-01-03      1    5
         2016-01-04      1    5
         2016-01-05      1    5
         2016-01-06      1    5
         2016-01-07      1    5
         2016-01-08      1    5
         2016-01-09      1    5
         2016-01-10      1    6
   2     2016-01-17      2    7
         2016-01-18      2    7
         2016-01-19      2    7
         2016-01-20      2    7
         2016-01-21      2    7
         2016-01-22      2    7
         2016-01-23      2    7
         2016-01-24      2    8

.. _whatsnew_0181.enhancements.method_chain:

Method chaining improvements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following methods / indexers now accept a ``callable``. It is intended to make
these more useful in method chains, see the :ref:`documentation <indexing.callable>`.
(:issue:`11485`, :issue:`12533`)

- ``.where()`` and ``.mask()``
- ``.loc[]``, ``iloc[]`` and ``.ix[]``
- ``[]`` indexing

Methods ``.where()`` and ``.mask()``
""""""""""""""""""""""""""""""""""""

These can accept a callable for the condition and ``other``
arguments.

.. ipython:: python

   df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]})
   df.where(lambda x: x > 4, lambda x: x + 10)

Methods ``.loc[]``, ``.iloc[]``, ``.ix[]``
""""""""""""""""""""""""""""""""""""""""""

These can accept a callable, and a tuple of callable as a slicer. The callable
can return a valid boolean indexer or anything which is valid for these indexer's input.

.. ipython:: python

   # callable returns bool indexer
   df.loc[lambda x: x.A >= 2, lambda x: x.sum() > 10]

   # callable returns list of labels
   df.loc[lambda x: [1, 2], lambda x: ["A", "B"]]

Indexing with ``[]``
""""""""""""""""""""

Finally, you can use a callable in ``[]`` indexing of Series, DataFrame and Panel.
The callable must return a valid input for ``[]`` indexing depending on its
class and index type.

.. ipython:: python

   df[lambda x: "A"]

Using these methods / indexers, you can chain data selection operations
without using temporary variable.

.. ipython:: python

   bb = pd.read_csv("data/baseball.csv", index_col="id")
   (bb.groupby(["year", "team"]).sum(numeric_only=True).loc[lambda df: df.r > 100])

.. _whatsnew_0181.partial_string_indexing:

Partial string indexing on ``DatetimeIndex`` when part of a ``MultiIndex``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Partial string indexing now matches on ``DateTimeIndex`` when part of a ``MultiIndex`` (:issue:`10331`)

.. code-block:: ipython

   In [20]: dft2 = pd.DataFrame(
      ....:     np.random.randn(20, 1),
      ....:     columns=["A"],
      ....:     index=pd.MultiIndex.from_product(
      ....:         [pd.date_range("20130101", periods=10, freq="12H"), ["a", "b"]]
      ....:     ),
      ....: )
      ....:

   In [21]: dft2
   Out[21]:
                                 A
   2013-01-01 00:00:00 a  0.469112
                       b -0.282863
   2013-01-01 12:00:00 a -1.509059
                       b -1.135632
   2013-01-02 00:00:00 a  1.212112
   ...                         ...
   2013-01-04 12:00:00 b  0.271860
   2013-01-05 00:00:00 a -0.424972
                       b  0.567020
   2013-01-05 12:00:00 a  0.276232
                       b -1.087401

   [20 rows x 1 columns]

   In [22]: dft2.loc["2013-01-05"]
   Out[22]:
                                 A
   2013-01-05 00:00:00 a -0.424972
                       b  0.567020
   2013-01-05 12:00:00 a  0.276232
                       b -1.087401

   [4 rows x 1 columns]

On other levels

.. code-block:: ipython

   In [26]: idx = pd.IndexSlice

   In [27]: dft2 = dft2.swaplevel(0, 1).sort_index()

   In [28]: dft2
   Out[28]:
                                 A
   a 2013-01-01 00:00:00  0.469112
     2013-01-01 12:00:00 -1.509059
     2013-01-02 00:00:00  1.212112
     2013-01-02 12:00:00  0.119209
     2013-01-03 00:00:00 -0.861849
   ...                         ...
   b 2013-01-03 12:00:00  1.071804
     2013-01-04 00:00:00 -0.706771
     2013-01-04 12:00:00  0.271860
     2013-01-05 00:00:00  0.567020
     2013-01-05 12:00:00 -1.087401

   [20 rows x 1 columns]

   In [29]: dft2.loc[idx[:, "2013-01-05"], :]
   Out[29]:
                                 A
   a 2013-01-05 00:00:00 -0.424972
     2013-01-05 12:00:00  0.276232
   b 2013-01-05 00:00:00  0.567020
     2013-01-05 12:00:00 -1.087401

   [4 rows x 1 columns]

.. _whatsnew_0181.enhancements.assembling:

Assembling datetimes
^^^^^^^^^^^^^^^^^^^^

``pd.to_datetime()`` has gained the ability to assemble datetimes from a passed in ``DataFrame`` or a dict. (:issue:`8158`).

.. ipython:: python

   df = pd.DataFrame(
       {"year": [2015, 2016], "month": [2, 3], "day": [4, 5], "hour": [2, 3]}
   )
   df

Assembling using the passed frame.

.. ipython:: python

   pd.to_datetime(df)

You can pass only the columns that you need to assemble.

.. ipython:: python

   pd.to_datetime(df[["year", "month", "day"]])

.. _whatsnew_0181.other:

Other enhancements
^^^^^^^^^^^^^^^^^^

- ``pd.read_csv()`` now supports ``delim_whitespace=True`` for the Python engine (:issue:`12958`)
- ``pd.read_csv()`` now supports opening ZIP files that contains a single CSV, via extension inference or explicit ``compression='zip'`` (:issue:`12175`)
- ``pd.read_csv()`` now supports opening files using xz compression, via extension inference or explicit ``compression='xz'`` is specified; ``xz`` compressions is also supported by ``DataFrame.to_csv`` in the same way (:issue:`11852`)
- ``pd.read_msgpack()`` now always gives writeable ndarrays even when compression is used (:issue:`12359`).
- ``pd.read_msgpack()`` now supports serializing and de-serializing categoricals with msgpack (:issue:`12573`)
- ``.to_json()`` now supports ``NDFrames`` that contain categorical and sparse data (:issue:`10778`)
- ``interpolate()`` now supports ``method='akima'`` (:issue:`7588`).
- ``pd.read_excel()`` now accepts path objects (e.g. ``pathlib.Path``, ``py.path.local``) for the file path, in line with other ``read_*`` functions (:issue:`12655`)
- Added ``.weekday_name`` property as a component to ``DatetimeIndex`` and the ``.dt`` accessor. (:issue:`11128`)

- ``Index.take`` now handles ``allow_fill`` and ``fill_value`` consistently (:issue:`12631`)

  .. ipython:: python

     idx = pd.Index([1.0, 2.0, 3.0, 4.0], dtype="float")

     # default, allow_fill=True, fill_value=None
     idx.take([2, -1])
     idx.take([2, -1], fill_value=True)

- ``Index`` now supports ``.str.get_dummies()`` which returns ``MultiIndex``, see :ref:`Creating Indicator Variables <text.indicator>` (:issue:`10008`, :issue:`10103`)

  .. ipython:: python

     idx = pd.Index(["a|b", "a|c", "b|c"])
     idx.str.get_dummies("|")


- ``pd.crosstab()`` has gained a ``normalize`` argument for normalizing frequency tables (:issue:`12569`). Examples in the updated docs :ref:`here <reshaping.crosstabulations>`.
- ``.resample(..).interpolate()`` is now supported (:issue:`12925`)
- ``.isin()`` now accepts passed ``sets`` (:issue:`12988`)

.. _whatsnew_0181.sparse:

Sparse changes
~~~~~~~~~~~~~~

These changes conform sparse handling to return the correct types and work to make a smoother experience with indexing.

``SparseArray.take`` now returns a scalar for scalar input, ``SparseArray`` for others. Furthermore, it handles a negative indexer with the same rule as ``Index`` (:issue:`10560`, :issue:`12796`)

.. code-block:: python

   s = pd.SparseArray([np.nan, np.nan, 1, 2, 3, np.nan, 4, 5, np.nan, 6])
   s.take(0)
   s.take([1, 2, 3])

- Bug in ``SparseSeries[]`` indexing with ``Ellipsis`` raises ``KeyError`` (:issue:`9467`)
- Bug in ``SparseArray[]`` indexing with tuples are not handled properly (:issue:`12966`)
- Bug in ``SparseSeries.loc[]`` with list-like input raises ``TypeError`` (:issue:`10560`)
- Bug in ``SparseSeries.iloc[]`` with scalar input may raise ``IndexError`` (:issue:`10560`)
- Bug in ``SparseSeries.loc[]``, ``.iloc[]`` with ``slice`` returns ``SparseArray``, rather than ``SparseSeries`` (:issue:`10560`)
- Bug in ``SparseDataFrame.loc[]``, ``.iloc[]`` may results in dense ``Series``, rather than ``SparseSeries`` (:issue:`12787`)
- Bug in ``SparseArray`` addition ignores ``fill_value`` of right hand side (:issue:`12910`)
- Bug in ``SparseArray`` mod raises ``AttributeError`` (:issue:`12910`)
- Bug in ``SparseArray`` pow calculates ``1 ** np.nan`` as ``np.nan`` which must be 1 (:issue:`12910`)
- Bug in ``SparseArray`` comparison output may incorrect result or raise ``ValueError`` (:issue:`12971`)
- Bug in ``SparseSeries.__repr__`` raises ``TypeError`` when it is longer than ``max_rows`` (:issue:`10560`)
- Bug in ``SparseSeries.shape`` ignores ``fill_value`` (:issue:`10452`)
- Bug in ``SparseSeries`` and ``SparseArray`` may have different ``dtype`` from its dense values (:issue:`12908`)
- Bug in ``SparseSeries.reindex`` incorrectly handle ``fill_value`` (:issue:`12797`)
- Bug in ``SparseArray.to_frame()`` results in ``DataFrame``, rather than ``SparseDataFrame`` (:issue:`9850`)
- Bug in ``SparseSeries.value_counts()`` does not count ``fill_value`` (:issue:`6749`)
- Bug in ``SparseArray.to_dense()`` does not preserve ``dtype`` (:issue:`10648`)
- Bug in ``SparseArray.to_dense()`` incorrectly handle ``fill_value`` (:issue:`12797`)
- Bug in ``pd.concat()`` of ``SparseSeries`` results in dense (:issue:`10536`)
- Bug in ``pd.concat()`` of ``SparseDataFrame`` incorrectly handle ``fill_value`` (:issue:`9765`)
- Bug in ``pd.concat()`` of ``SparseDataFrame`` may raise ``AttributeError`` (:issue:`12174`)
- Bug in ``SparseArray.shift()`` may raise ``NameError`` or ``TypeError`` (:issue:`12908`)

.. _whatsnew_0181.api:

API changes
~~~~~~~~~~~

.. _whatsnew_0181.api.groubynth:

Method ``.groupby(..).nth()`` changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The index in ``.groupby(..).nth()`` output is now more consistent when the ``as_index`` argument is passed (:issue:`11039`):

.. ipython:: python

   df = pd.DataFrame({"A": ["a", "b", "a"], "B": [1, 2, 3]})
   df

Previous behavior:

.. code-block:: ipython

   In [3]: df.groupby('A', as_index=True)['B'].nth(0)
   Out[3]:
   0    1
   1    2
   Name: B, dtype: int64

   In [4]: df.groupby('A', as_index=False)['B'].nth(0)
   Out[4]:
   0    1
   1    2
   Name: B, dtype: int64

New behavior:

.. ipython:: python

    df.groupby("A", as_index=True)["B"].nth(0)
    df.groupby("A", as_index=False)["B"].nth(0)

Furthermore, previously, a ``.groupby`` would always sort, regardless if ``sort=False`` was passed with ``.nth()``.

.. ipython:: python

   np.random.seed(1234)
   df = pd.DataFrame(np.random.randn(100, 2), columns=["a", "b"])
   df["c"] = np.random.randint(0, 4, 100)

Previous behavior:

.. code-block:: ipython

   In [4]: df.groupby('c', sort=True).nth(1)
   Out[4]:
             a         b
   c
   0 -0.334077  0.002118
   1  0.036142 -2.074978
   2 -0.720589  0.887163
   3  0.859588 -0.636524

   In [5]: df.groupby('c', sort=False).nth(1)
   Out[5]:
             a         b
   c
   0 -0.334077  0.002118
   1  0.036142 -2.074978
   2 -0.720589  0.887163
   3  0.859588 -0.636524

New behavior:

.. ipython:: python

   df.groupby("c", sort=True).nth(1)
   df.groupby("c", sort=False).nth(1)


.. _whatsnew_0181.numpy_compatibility:

NumPy function compatibility
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Compatibility between pandas array-like methods (e.g. ``sum`` and ``take``) and their ``numpy``
counterparts has been greatly increased by augmenting the signatures of the ``pandas`` methods so
as to accept arguments that can be passed in from ``numpy``, even if they are not necessarily
used in the ``pandas`` implementation (:issue:`12644`, :issue:`12638`, :issue:`12687`)

- ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`)
- Bug in numpy compatibility of ``np.round()`` on a ``Series`` (:issue:`12600`)

An example of this signature augmentation is illustrated below:

.. code-block:: python

   sp = pd.SparseDataFrame([1, 2, 3])
   sp

Previous behaviour:

.. code-block:: ipython

   In [2]: np.cumsum(sp, axis=0)
   ...
   TypeError: cumsum() takes at most 2 arguments (4 given)

New behaviour:

.. code-block:: python

   np.cumsum(sp, axis=0)

.. _whatsnew_0181.apply_resample:

Using ``.apply`` on GroupBy resampling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Using ``apply`` on resampling groupby operations (using a ``pd.TimeGrouper``) now has the same output types as similar ``apply`` calls on other groupby operations. (:issue:`11742`).

.. ipython:: python

    df = pd.DataFrame(
        {"date": pd.to_datetime(["10/10/2000", "11/10/2000"]), "value": [10, 13]}
    )
    df

Previous behavior:

.. code-block:: ipython

    In [1]: df.groupby(pd.TimeGrouper(key='date',
       ...:                           freq='M')).apply(lambda x: x.value.sum())
    Out[1]:
    ...
    TypeError: cannot concatenate a non-NDFrame object

    # Output is a Series
    In [2]: df.groupby(pd.TimeGrouper(key='date',
       ...:                           freq='M')).apply(lambda x: x[['value']].sum())
    Out[2]:
    date
    2000-10-31  value    10
    2000-11-30  value    13
    dtype: int64

New behavior:

.. code-block:: ipython

    # Output is a Series
    In [55]: df.groupby(pd.TimeGrouper(key='date',
        ...:                           freq='M')).apply(lambda x: x.value.sum())
    Out[55]:
    date
    2000-10-31    10
    2000-11-30    13
    Freq: M, dtype: int64

    # Output is a DataFrame
    In [56]: df.groupby(pd.TimeGrouper(key='date',
        ...:                           freq='M')).apply(lambda x: x[['value']].sum())
    Out[56]:
                value
    date
    2000-10-31     10
    2000-11-30     13

.. _whatsnew_0181.read_csv_exceptions:

Changes in ``read_csv`` exceptions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


In order to standardize the ``read_csv`` API for both the ``c`` and ``python`` engines, both will now raise an
``EmptyDataError``, a subclass of ``ValueError``, in response to empty columns or header (:issue:`12493`, :issue:`12506`)

Previous behaviour:

.. code-block:: ipython

   In [1]: import io

   In [2]: df = pd.read_csv(io.StringIO(''), engine='c')
   ...
   ValueError: No columns to parse from file

   In [3]: df = pd.read_csv(io.StringIO(''), engine='python')
   ...
   StopIteration

New behaviour:

.. code-block:: ipython

   In [1]: df = pd.read_csv(io.StringIO(''), engine='c')
   ...
   pandas.io.common.EmptyDataError: No columns to parse from file

   In [2]: df = pd.read_csv(io.StringIO(''), engine='python')
   ...
   pandas.io.common.EmptyDataError: No columns to parse from file

In addition to this error change, several others have been made as well:

- ``CParserError`` now sub-classes ``ValueError`` instead of just a ``Exception`` (:issue:`12551`)
- A ``CParserError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine cannot parse a column (:issue:`12506`)
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine encounters a ``NaN`` value in an integer column (:issue:`12506`)
- A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when ``true_values`` is specified, and the ``c`` engine encounters an element in a column containing unencodable bytes (:issue:`12506`)
- ``pandas.parser.OverflowError`` exception has been removed and has been replaced with Python's built-in ``OverflowError`` exception (:issue:`12506`)
- ``pd.read_csv()`` no longer allows a combination of strings and integers for the ``usecols`` parameter (:issue:`12678`)


.. _whatsnew_0181.api.to_datetime:

Method ``to_datetime`` error changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Bugs in ``pd.to_datetime()`` when passing a ``unit`` with convertible entries and ``errors='coerce'`` or non-convertible with ``errors='ignore'``. Furthermore, an ``OutOfBoundsDateime`` exception will be raised when an out-of-range value is encountered for that unit when ``errors='raise'``. (:issue:`11758`, :issue:`13052`, :issue:`13059`)

Previous behaviour:

.. code-block:: ipython

   In [27]: pd.to_datetime(1420043460, unit='s', errors='coerce')
   Out[27]: NaT

   In [28]: pd.to_datetime(11111111, unit='D', errors='ignore')
   OverflowError: Python int too large to convert to C long

   In [29]: pd.to_datetime(11111111, unit='D', errors='raise')
   OverflowError: Python int too large to convert to C long

New behaviour:

.. code-block:: ipython

   In [2]: pd.to_datetime(1420043460, unit='s', errors='coerce')
   Out[2]: Timestamp('2014-12-31 16:31:00')

   In [3]: pd.to_datetime(11111111, unit='D', errors='ignore')
   Out[3]: 11111111

   In [4]: pd.to_datetime(11111111, unit='D', errors='raise')
   OutOfBoundsDatetime: cannot convert input with unit 'D'

.. _whatsnew_0181.api.other:

Other API changes
^^^^^^^^^^^^^^^^^

- ``.swaplevel()`` for ``Series``, ``DataFrame``, ``Panel``, and ``MultiIndex`` now features defaults for its first two parameters ``i`` and ``j`` that swap the two innermost levels of the index. (:issue:`12934`)
- ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`)
- ``Period`` and ``PeriodIndex`` now raises ``IncompatibleFrequency`` error which inherits ``ValueError`` rather than raw ``ValueError`` (:issue:`12615`)
- ``Series.apply`` for category dtype now applies the passed function to each of the ``.categories`` (and not the ``.codes``), and returns a ``category`` dtype if possible (:issue:`12473`)
- ``read_csv`` will now raise a ``TypeError`` if ``parse_dates`` is neither a boolean, list, or dictionary (matches the doc-string) (:issue:`5636`)
- The default for ``.query()/.eval()`` is now ``engine=None``, which will use ``numexpr`` if it's installed; otherwise it will fallback to the ``python`` engine. This mimics the pre-0.18.1 behavior if ``numexpr`` is installed (and which, previously, if numexpr was not installed, ``.query()/.eval()`` would raise). (:issue:`12749`)
- ``pd.show_versions()`` now includes ``pandas_datareader`` version (:issue:`12740`)
- Provide a proper ``__name__`` and ``__qualname__`` attributes for generic functions (:issue:`12021`)
- ``pd.concat(ignore_index=True)`` now uses ``RangeIndex`` as default (:issue:`12695`)
- ``pd.merge()`` and ``DataFrame.join()`` will show a ``UserWarning`` when merging/joining a single- with a multi-leveled dataframe (:issue:`9455`, :issue:`12219`)
- Compat with ``scipy`` > 0.17 for deprecated ``piecewise_polynomial`` interpolation method; support for the replacement ``from_derivatives`` method (:issue:`12887`)

.. _whatsnew_0181.deprecations:

Deprecations
^^^^^^^^^^^^

- The method name ``Index.sym_diff()`` is deprecated and can be replaced by ``Index.symmetric_difference()`` (:issue:`12591`)
- The method name ``Categorical.sort()`` is deprecated in favor of ``Categorical.sort_values()`` (:issue:`12882`)








.. _whatsnew_0181.performance:

Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Improved speed of SAS reader (:issue:`12656`, :issue:`12961`)
- Performance improvements in ``.groupby(..).cumcount()`` (:issue:`11039`)
- Improved memory usage in ``pd.read_csv()`` when using ``skiprows=an_integer`` (:issue:`13005`)
- Improved performance of ``DataFrame.to_sql`` when checking case sensitivity for tables. Now only checks if table has been created correctly when table name is not lower case. (:issue:`12876`)
- Improved performance of ``Period`` construction and time series plotting (:issue:`12903`, :issue:`11831`).
- Improved performance of ``.str.encode()`` and ``.str.decode()`` methods (:issue:`13008`)
- Improved performance of ``to_numeric`` if input is numeric dtype (:issue:`12777`)
- Improved performance of sparse arithmetic with ``IntIndex`` (:issue:`13036`)








.. _whatsnew_0181.bug_fixes:

Bug fixes
~~~~~~~~~
- ``usecols`` parameter in ``pd.read_csv`` is now respected even when the lines of a CSV file are not even (:issue:`12203`)
- Bug in ``groupby.transform(..)`` when ``axis=1`` is specified with a non-monotonic ordered index (:issue:`12713`)
- Bug in ``Period`` and ``PeriodIndex`` creation raises ``KeyError`` if ``freq="Minute"`` is specified. Note that "Minute" freq is deprecated in v0.17.0, and recommended to use ``freq="T"`` instead (:issue:`11854`)
- Bug in ``.resample(...).count()`` with a ``PeriodIndex`` always raising a ``TypeError`` (:issue:`12774`)
- Bug in ``.resample(...)`` with a ``PeriodIndex`` casting to a ``DatetimeIndex`` when empty (:issue:`12868`)
- Bug in ``.resample(...)`` with a ``PeriodIndex`` when resampling to an existing frequency (:issue:`12770`)
- Bug in printing data which contains ``Period`` with different ``freq`` raises ``ValueError`` (:issue:`12615`)
- Bug in ``Series`` construction with ``Categorical`` and ``dtype='category'`` is specified (:issue:`12574`)
- Bugs in concatenation with a coercible dtype was too aggressive, resulting in different dtypes in output formatting when an object was longer than ``display.max_rows`` (:issue:`12411`, :issue:`12045`, :issue:`11594`, :issue:`10571`, :issue:`12211`)
- Bug in ``float_format`` option with option not being validated as a callable. (:issue:`12706`)
- Bug in ``GroupBy.filter`` when ``dropna=False`` and no groups fulfilled the criteria (:issue:`12768`)
- Bug in ``__name__`` of ``.cum*`` functions (:issue:`12021`)
- Bug in ``.astype()`` of a ``Float64Inde/Int64Index`` to an ``Int64Index`` (:issue:`12881`)
- Bug in round tripping an integer based index in ``.to_json()/.read_json()`` when ``orient='index'`` (the default) (:issue:`12866`)
- Bug in plotting ``Categorical`` dtypes cause error when attempting stacked bar plot (:issue:`13019`)
- Compat with >= ``numpy`` 1.11 for ``NaT`` comparisons (:issue:`12969`)
- Bug in ``.drop()`` with a non-unique ``MultiIndex``. (:issue:`12701`)
- Bug in ``.concat`` of datetime tz-aware and naive DataFrames (:issue:`12467`)
- Bug in correctly raising a ``ValueError`` in ``.resample(..).fillna(..)`` when passing a non-string (:issue:`12952`)
- Bug fixes in various encoding and header processing issues in ``pd.read_sas()`` (:issue:`12659`, :issue:`12654`, :issue:`12647`, :issue:`12809`)
- Bug in ``pd.crosstab()`` where would silently ignore ``aggfunc`` if ``values=None`` (:issue:`12569`).
- Potential segfault in ``DataFrame.to_json`` when serialising ``datetime.time`` (:issue:`11473`).
- Potential segfault in ``DataFrame.to_json`` when attempting to serialise 0d array (:issue:`11299`).
- Segfault in ``to_json`` when attempting to serialise a ``DataFrame`` or ``Series`` with non-ndarray values; now supports serialization of ``category``, ``sparse``, and ``datetime64[ns, tz]`` dtypes (:issue:`10778`).
- Bug in ``DataFrame.to_json`` with unsupported dtype not passed to default handler (:issue:`12554`).
- Bug in ``.align`` not returning the sub-class (:issue:`12983`)
- Bug in aligning a ``Series`` with a ``DataFrame`` (:issue:`13037`)
- Bug in ``ABCPanel`` in which ``Panel4D`` was not being considered as a valid instance of this generic type (:issue:`12810`)


- Bug in consistency of ``.name`` on ``.groupby(..).apply(..)`` cases (:issue:`12363`)

- Bug in ``Timestamp.__repr__`` that caused ``pprint`` to fail in nested structures (:issue:`12622`)
- Bug in ``Timedelta.min`` and ``Timedelta.max``, the properties now report the true minimum/maximum ``timedeltas`` as recognized by pandas. See the :ref:`documentation <timedeltas.limitations>`. (:issue:`12727`)
- Bug in ``.quantile()`` with interpolation may coerce to ``float`` unexpectedly (:issue:`12772`)
- Bug in ``.quantile()`` with empty ``Series`` may return scalar rather than empty ``Series`` (:issue:`12772`)


- Bug in ``.loc`` with out-of-bounds in a large indexer would raise ``IndexError`` rather than ``KeyError`` (:issue:`12527`)
- Bug in resampling when using a ``TimedeltaIndex`` and ``.asfreq()``, would previously not include the final fencepost (:issue:`12926`)

- Bug in equality testing with a ``Categorical`` in a ``DataFrame`` (:issue:`12564`)
- Bug in ``GroupBy.first()``, ``.last()`` returns incorrect row when ``TimeGrouper`` is used (:issue:`7453`)



- Bug in ``pd.read_csv()`` with the ``c`` engine when specifying ``skiprows`` with newlines in quoted items (:issue:`10911`, :issue:`12775`)
- Bug in ``DataFrame`` timezone lost when assigning tz-aware datetime ``Series`` with alignment (:issue:`12981`)




- Bug in ``.value_counts()`` when ``normalize=True`` and ``dropna=True`` where nulls still contributed to the normalized count (:issue:`12558`)
- Bug in ``Series.value_counts()`` loses name if its dtype is ``category`` (:issue:`12835`)
- Bug in ``Series.value_counts()`` loses timezone info (:issue:`12835`)
- Bug in ``Series.value_counts(normalize=True)`` with ``Categorical`` raises ``UnboundLocalError`` (:issue:`12835`)
- Bug in ``Panel.fillna()`` ignoring ``inplace=True`` (:issue:`12633`)
- Bug in ``pd.read_csv()`` when specifying ``names``, ``usecols``, and ``parse_dates`` simultaneously with the ``c`` engine (:issue:`9755`)
- Bug in ``pd.read_csv()`` when specifying ``delim_whitespace=True`` and ``lineterminator`` simultaneously with the ``c`` engine (:issue:`12912`)
- Bug in ``Series.rename``, ``DataFrame.rename`` and ``DataFrame.rename_axis`` not treating ``Series`` as mappings to relabel (:issue:`12623`).
- Clean in ``.rolling.min`` and ``.rolling.max`` to enhance dtype handling (:issue:`12373`)
- Bug in ``groupby`` where complex types are coerced to float (:issue:`12902`)
- Bug in ``Series.map`` raises ``TypeError`` if its dtype is ``category`` or tz-aware ``datetime`` (:issue:`12473`)

- Bugs on 32bit platforms for some test comparisons (:issue:`12972`)
- Bug in index coercion when falling back from ``RangeIndex`` construction (:issue:`12893`)
- Better error message in window functions when invalid argument (e.g. a float window) is passed (:issue:`12669`)

- Bug in slicing subclassed ``DataFrame`` defined to return subclassed ``Series`` may return normal ``Series`` (:issue:`11559`)


- Bug in ``.str`` accessor methods may raise ``ValueError`` if input has ``name`` and the result is ``DataFrame`` or ``MultiIndex`` (:issue:`12617`)
- Bug in ``DataFrame.last_valid_index()`` and ``DataFrame.first_valid_index()`` on empty frames (:issue:`12800`)


- Bug in ``CategoricalIndex.get_loc`` returns different result from regular ``Index`` (:issue:`12531`)
- Bug in ``PeriodIndex.resample`` where name not propagated (:issue:`12769`)

- Bug in ``date_range`` ``closed`` keyword and timezones (:issue:`12684`).

- Bug in ``pd.concat`` raises ``AttributeError`` when input data contains tz-aware datetime and timedelta (:issue:`12620`)
- Bug in ``pd.concat`` did not handle empty ``Series`` properly (:issue:`11082`)

- Bug in ``.plot.bar`` alignment when ``width`` is specified with ``int`` (:issue:`12979`)


- Bug in ``fill_value`` is ignored if the argument to a binary operator is a constant (:issue:`12723`)

- Bug in ``pd.read_html()`` when using bs4 flavor and parsing table with a header and only one column (:issue:`9178`)

- Bug in ``.pivot_table`` when ``margins=True`` and ``dropna=True`` where nulls still contributed to margin count (:issue:`12577`)
- Bug in ``.pivot_table`` when ``dropna=False`` where table index/column names disappear (:issue:`12133`)
- Bug in ``pd.crosstab()`` when ``margins=True`` and ``dropna=False`` which raised (:issue:`12642`)

- Bug in ``Series.name`` when ``name`` attribute can be a hashable type (:issue:`12610`)

- Bug in ``.describe()`` resets categorical columns information (:issue:`11558`)
- Bug where ``loffset`` argument was not applied when calling ``resample().count()`` on a timeseries (:issue:`12725`)
- ``pd.read_excel()`` now accepts column names associated with keyword argument ``names`` (:issue:`12870`)
- Bug in ``pd.to_numeric()`` with ``Index`` returns ``np.ndarray``, rather than ``Index`` (:issue:`12777`)
- Bug in ``pd.to_numeric()`` with datetime-like may raise ``TypeError`` (:issue:`12777`)
- Bug in ``pd.to_numeric()`` with scalar raises ``ValueError`` (:issue:`12777`)


.. _whatsnew_0.18.1.contributors:

Contributors
~~~~~~~~~~~~

.. contributors:: v0.18.0..v0.18.1