File: v0.25.0.rst

package info (click to toggle)
pandas 2.2.3%2Bdfsg-9
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 66,784 kB
  • sloc: python: 422,228; ansic: 9,190; sh: 270; xml: 102; makefile: 83
file content (1275 lines) | stat: -rw-r--r-- 65,909 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
.. _whatsnew_0250:

What's new in 0.25.0 (July 18, 2019)
------------------------------------

.. warning::

   Starting with the 0.25.x series of releases, pandas only supports Python 3.5.3 and higher.
   See `Dropping Python 2.7 <https://pandas.pydata.org/pandas-docs/version/0.24/install.html#install-dropping-27>`_ for more details.

.. warning::

   The minimum supported Python version will be bumped to 3.6 in a future release.

.. warning::

   ``Panel`` has been fully removed. For N-D labeled data structures, please
   use `xarray <https://xarray.pydata.org/en/stable/>`_

.. warning::

   :func:`read_pickle` and :func:`read_msgpack` are only guaranteed backwards compatible back to
   pandas version 0.20.3 (:issue:`27082`)

{{ header }}

These are the changes in pandas 0.25.0. See :ref:`release` for a full changelog
including other versions of pandas.


Enhancements
~~~~~~~~~~~~

.. _whatsnew_0250.enhancements.agg_relabel:

GroupBy aggregation with relabeling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

pandas has added special groupby behavior, known as "named aggregation", for naming the
output columns when applying multiple aggregation functions to specific columns (:issue:`18366`, :issue:`26512`).

.. ipython:: python

   animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
                           'height': [9.1, 6.0, 9.5, 34.0],
                           'weight': [7.9, 7.5, 9.9, 198.0]})
   animals
   animals.groupby("kind").agg(
       min_height=pd.NamedAgg(column='height', aggfunc='min'),
       max_height=pd.NamedAgg(column='height', aggfunc='max'),
       average_weight=pd.NamedAgg(column='weight', aggfunc="mean"),
   )

Pass the desired columns names as the ``**kwargs`` to ``.agg``. The values of ``**kwargs``
should be tuples where the first element is the column selection, and the second element is the
aggregation function to apply. pandas provides the ``pandas.NamedAgg`` namedtuple to make it clearer
what the arguments to the function are, but plain tuples are accepted as well.

.. ipython:: python

   animals.groupby("kind").agg(
       min_height=('height', 'min'),
       max_height=('height', 'max'),
       average_weight=('weight', 'mean'),
   )

Named aggregation is the recommended replacement for the deprecated "dict-of-dicts"
approach to naming the output of column-specific aggregations (:ref:`whatsnew_0200.api_breaking.deprecate_group_agg_dict`).

A similar approach is now available for Series groupby objects as well. Because there's no need for
column selection, the values can just be the functions to apply

.. ipython:: python

   animals.groupby("kind").height.agg(
       min_height="min",
       max_height="max",
   )


This type of aggregation is the recommended alternative to the deprecated behavior when passing
a dict to a Series groupby aggregation (:ref:`whatsnew_0200.api_breaking.deprecate_group_agg_dict`).

See :ref:`groupby.aggregate.named` for more.

.. _whatsnew_0250.enhancements.multiple_lambdas:

GroupBy aggregation with multiple lambdas
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can now provide multiple lambda functions to a list-like aggregation in
:class:`.GroupBy.agg` (:issue:`26430`).

.. ipython:: python

   animals.groupby('kind').height.agg([
       lambda x: x.iloc[0], lambda x: x.iloc[-1]
   ])

   animals.groupby('kind').agg([
       lambda x: x.iloc[0] - x.iloc[1],
       lambda x: x.iloc[0] + x.iloc[1]
   ])

Previously, these raised a ``SpecificationError``.

.. _whatsnew_0250.enhancements.multi_index_repr:

Better repr for MultiIndex
^^^^^^^^^^^^^^^^^^^^^^^^^^

Printing of :class:`MultiIndex` instances now shows tuples of each row and ensures
that the tuple items are vertically aligned, so it's now easier to understand
the structure of the ``MultiIndex``. (:issue:`13480`):

The repr now looks like this:

.. ipython:: python

   pd.MultiIndex.from_product([['a', 'abc'], range(500)])

Previously, outputting a :class:`MultiIndex` printed all the ``levels`` and
``codes`` of the ``MultiIndex``, which was visually unappealing and made
the output more difficult to navigate. For example (limiting the range to 5):

.. code-block:: ipython

   In [1]: pd.MultiIndex.from_product([['a', 'abc'], range(5)])
   Out[1]: MultiIndex(levels=[['a', 'abc'], [0, 1, 2, 3]],
      ...:            codes=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 1, 2, 3, 0, 1, 2, 3]])

In the new repr, all values will be shown, if the number of rows is smaller
than :attr:`options.display.max_seq_items` (default: 100 items). Horizontally,
the output will truncate, if it's wider than :attr:`options.display.width`
(default: 80 characters).

.. _whatsnew_0250.enhancements.shorter_truncated_repr:

Shorter truncated repr for Series and DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Currently, the default display options of pandas ensure that when a Series
or DataFrame has more than 60 rows, its repr gets truncated to this maximum
of 60 rows (the ``display.max_rows`` option). However, this still gives
a repr that takes up a large part of the vertical screen estate. Therefore,
a new option ``display.min_rows`` is introduced with a default of 10 which
determines the number of rows showed in the truncated repr:

- For small Series or DataFrames, up to ``max_rows`` number of rows is shown
  (default: 60).
- For larger Series of DataFrame with a length above ``max_rows``, only
  ``min_rows`` number of rows is shown (default: 10, i.e. the first and last
  5 rows).

This dual option allows to still see the full content of relatively small
objects (e.g. ``df.head(20)`` shows all 20 rows), while giving a brief repr
for large objects.

To restore the previous behaviour of a single threshold, set
``pd.options.display.min_rows = None``.

.. _whatsnew_0250.enhancements.json_normalize_with_max_level:

JSON normalize with max_level param support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`json_normalize` normalizes the provided input dict to all
nested levels. The new max_level parameter provides more control over
which level to end normalization (:issue:`23843`):

The repr now looks like this:

.. code-block:: ipython

    from pandas.io.json import json_normalize
    data = [{
        'CreatedBy': {'Name': 'User001'},
        'Lookup': {'TextField': 'Some text',
                   'UserField': {'Id': 'ID001', 'Name': 'Name001'}},
        'Image': {'a': 'b'}
    }]
    json_normalize(data, max_level=1)


.. _whatsnew_0250.enhancements.explode:

Series.explode to split list-like values to rows
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:class:`Series` and :class:`DataFrame` have gained the :meth:`DataFrame.explode` methods to transform list-likes to individual rows. See :ref:`section on Exploding list-like column <reshaping.explode>` in docs for more information (:issue:`16538`, :issue:`10511`)


Here is a typical usecase. You have comma separated string in a column.

.. ipython:: python

    df = pd.DataFrame([{'var1': 'a,b,c', 'var2': 1},
                       {'var1': 'd,e,f', 'var2': 2}])
    df

Creating a long form ``DataFrame`` is now straightforward using chained operations

.. ipython:: python

    df.assign(var1=df.var1.str.split(',')).explode('var1')

.. _whatsnew_0250.enhancements.other:

Other enhancements
^^^^^^^^^^^^^^^^^^
- :func:`DataFrame.plot` keywords ``logy``, ``logx`` and ``loglog`` can now accept the value ``'sym'`` for symlog scaling. (:issue:`24867`)
- Added support for ISO week year format ('%G-%V-%u') when parsing datetimes using :meth:`to_datetime` (:issue:`16607`)
- Indexing of ``DataFrame`` and ``Series`` now accepts zerodim ``np.ndarray`` (:issue:`24919`)
- :meth:`Timestamp.replace` now supports the ``fold`` argument to disambiguate DST transition times (:issue:`25017`)
- :meth:`DataFrame.at_time` and :meth:`Series.at_time` now support :class:`datetime.time` objects with timezones (:issue:`24043`)
- :meth:`DataFrame.pivot_table` now accepts an ``observed`` parameter which is passed to underlying calls to :meth:`DataFrame.groupby` to speed up grouping categorical data. (:issue:`24923`)
- ``Series.str`` has gained :meth:`Series.str.casefold` method to removes all case distinctions present in a string (:issue:`25405`)
- :meth:`DataFrame.set_index` now works for instances of ``abc.Iterator``, provided their output is of the same length as the calling frame (:issue:`22484`, :issue:`24984`)
- :meth:`DatetimeIndex.union` now supports the ``sort`` argument. The behavior of the sort parameter matches that of :meth:`Index.union` (:issue:`24994`)
- :meth:`RangeIndex.union` now supports the ``sort`` argument. If ``sort=False`` an unsorted ``Int64Index`` is always returned. ``sort=None`` is the default and returns a monotonically increasing ``RangeIndex`` if possible or a sorted ``Int64Index`` if not (:issue:`24471`)
- :meth:`TimedeltaIndex.intersection` now also supports the ``sort`` keyword (:issue:`24471`)
- :meth:`DataFrame.rename` now supports the ``errors`` argument to raise errors when attempting to rename nonexistent keys (:issue:`13473`)
- Added :ref:`api.frame.sparse` for working with a ``DataFrame`` whose values are sparse (:issue:`25681`)
- :class:`RangeIndex` has gained :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop`, and :attr:`~RangeIndex.step` attributes (:issue:`25710`)
- :class:`datetime.timezone` objects are now supported as arguments to timezone methods and constructors (:issue:`25065`)
- :meth:`DataFrame.query` and :meth:`DataFrame.eval` now supports quoting column names with backticks to refer to names with spaces (:issue:`6508`)
- :func:`merge_asof` now gives a more clear error message when merge keys are categoricals that are not equal (:issue:`26136`)
- :meth:`.Rolling` supports exponential (or Poisson) window type (:issue:`21303`)
- Error message for missing required imports now includes the original import error's text (:issue:`23868`)
- :class:`DatetimeIndex` and :class:`TimedeltaIndex` now have a ``mean`` method (:issue:`24757`)
- :meth:`DataFrame.describe` now formats integer percentiles without decimal point (:issue:`26660`)
- Added support for reading SPSS .sav files using :func:`read_spss` (:issue:`26537`)
- Added new option ``plotting.backend`` to be able to select a plotting backend different than the existing ``matplotlib`` one. Use ``pandas.set_option('plotting.backend', '<backend-module>')`` where ``<backend-module`` is a library implementing the pandas plotting API (:issue:`14130`)
- :class:`pandas.offsets.BusinessHour` supports multiple opening hours intervals (:issue:`15481`)
- :func:`read_excel` can now use ``openpyxl`` to read Excel files via the ``engine='openpyxl'`` argument. This will become the default in a future release (:issue:`11499`)
- :func:`pandas.io.excel.read_excel` supports reading OpenDocument tables. Specify ``engine='odf'`` to enable. Consult the :ref:`IO User Guide <io.ods>` for more details (:issue:`9070`)
- :class:`Interval`, :class:`IntervalIndex`, and :class:`~arrays.IntervalArray` have gained an :attr:`~Interval.is_empty` attribute denoting if the given interval(s) are empty (:issue:`27219`)

.. _whatsnew_0250.api_breaking:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0250.api_breaking.utc_offset_indexing:


Indexing with date strings with UTC offsets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Indexing a :class:`DataFrame` or :class:`Series` with a :class:`DatetimeIndex` with a
date string with a UTC offset would previously ignore the UTC offset. Now, the UTC offset
is respected in indexing. (:issue:`24076`, :issue:`16785`)

.. ipython:: python

    df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific'))
    df

*Previous behavior*:

.. code-block:: ipython

    In [3]: df['2019-01-01 00:00:00+04:00':'2019-01-01 01:00:00+04:00']
    Out[3]:
                               0
    2019-01-01 00:00:00-08:00  0

*New behavior*:

.. ipython:: python

    df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00']


.. _whatsnew_0250.api_breaking.multi_indexing:


``MultiIndex`` constructed from levels and codes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Constructing a :class:`MultiIndex` with ``NaN`` levels or codes value < -1 was allowed previously.
Now, construction with codes value < -1 is not allowed and ``NaN`` levels' corresponding codes
would be reassigned as -1. (:issue:`19387`)

*Previous behavior*:

.. code-block:: ipython

    In [1]: pd.MultiIndex(levels=[[np.nan, None, pd.NaT, 128, 2]],
       ...:               codes=[[0, -1, 1, 2, 3, 4]])
       ...:
    Out[1]: MultiIndex(levels=[[nan, None, NaT, 128, 2]],
                       codes=[[0, -1, 1, 2, 3, 4]])

    In [2]: pd.MultiIndex(levels=[[1, 2]], codes=[[0, -2]])
    Out[2]: MultiIndex(levels=[[1, 2]],
                       codes=[[0, -2]])

*New behavior*:

.. ipython:: python
    :okexcept:

    pd.MultiIndex(levels=[[np.nan, None, pd.NaT, 128, 2]],
                  codes=[[0, -1, 1, 2, 3, 4]])
    pd.MultiIndex(levels=[[1, 2]], codes=[[0, -2]])


.. _whatsnew_0250.api_breaking.groupby_apply_first_group_once:

``GroupBy.apply`` on ``DataFrame`` evaluates first group only once
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The implementation of :meth:`.DataFrameGroupBy.apply`
previously evaluated the supplied function consistently twice on the first group
to infer if it is safe to use a fast code path. Particularly for functions with
side effects, this was an undesired behavior and may have led to surprises. (:issue:`2936`, :issue:`2656`, :issue:`7739`, :issue:`10519`, :issue:`12155`, :issue:`20084`, :issue:`21417`)

Now every group is evaluated only a single time.

.. ipython:: python

    df = pd.DataFrame({"a": ["x", "y"], "b": [1, 2]})
    df

    def func(group):
        print(group.name)
        return group

*Previous behavior*:

.. code-block:: python

   In [3]: df.groupby('a').apply(func)
   x
   x
   y
   Out[3]:
      a  b
   0  x  1
   1  y  2

*New behavior*:

.. code-block:: python

   In [3]: df.groupby('a').apply(func)
   x
   y
   Out[3]:
      a  b
   0  x  1
   1  y  2

Concatenating sparse values
^^^^^^^^^^^^^^^^^^^^^^^^^^^

When passed DataFrames whose values are sparse, :func:`concat` will now return a
:class:`Series` or :class:`DataFrame` with sparse values, rather than a :class:`SparseDataFrame` (:issue:`25702`).

.. ipython:: python

   df = pd.DataFrame({"A": pd.arrays.SparseArray([0, 1])})

*Previous behavior*:

.. code-block:: ipython

   In [2]: type(pd.concat([df, df]))
   pandas.core.sparse.frame.SparseDataFrame

*New behavior*:

.. ipython:: python

   type(pd.concat([df, df]))


This now matches the existing behavior of :class:`concat` on ``Series`` with sparse values.
:func:`concat` will continue to return a ``SparseDataFrame`` when all the values
are instances of ``SparseDataFrame``.

This change also affects routines using :func:`concat` internally, like :func:`get_dummies`,
which now returns a :class:`DataFrame` in all cases (previously a ``SparseDataFrame`` was
returned if all the columns were dummy encoded, and a :class:`DataFrame` otherwise).

Providing any ``SparseSeries`` or ``SparseDataFrame`` to :func:`concat` will
cause a ``SparseSeries`` or ``SparseDataFrame`` to be returned, as before.

The ``.str``-accessor performs stricter type checks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Due to the lack of more fine-grained dtypes, :attr:`Series.str` so far only checked whether the data was
of ``object`` dtype. :attr:`Series.str` will now infer the dtype data *within* the Series; in particular,
``'bytes'``-only data will raise an exception (except for :meth:`Series.str.decode`, :meth:`Series.str.get`,
:meth:`Series.str.len`, :meth:`Series.str.slice`), see :issue:`23163`, :issue:`23011`, :issue:`23551`.

*Previous behavior*:

.. code-block:: python

    In [1]: s = pd.Series(np.array(['a', 'ba', 'cba'], 'S'), dtype=object)

    In [2]: s
    Out[2]:
    0      b'a'
    1     b'ba'
    2    b'cba'
    dtype: object

    In [3]: s.str.startswith(b'a')
    Out[3]:
    0     True
    1    False
    2    False
    dtype: bool

*New behavior*:

.. ipython:: python
    :okexcept:

    s = pd.Series(np.array(['a', 'ba', 'cba'], 'S'), dtype=object)
    s
    s.str.startswith(b'a')

.. _whatsnew_0250.api_breaking.groupby_categorical:

Categorical dtypes are preserved during GroupBy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, columns that were categorical, but not the groupby key(s) would be converted to ``object`` dtype during groupby operations. pandas now will preserve these dtypes. (:issue:`18502`)

.. ipython:: python

   cat = pd.Categorical(["foo", "bar", "bar", "qux"], ordered=True)
   df = pd.DataFrame({'payload': [-1, -2, -1, -2], 'col': cat})
   df
   df.dtypes

*Previous Behavior*:

.. code-block:: python

   In [5]: df.groupby('payload').first().col.dtype
   Out[5]: dtype('O')

*New Behavior*:

.. ipython:: python

   df.groupby('payload').first().col.dtype


.. _whatsnew_0250.api_breaking.incompatible_index_unions:

Incompatible Index type unions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When performing :func:`Index.union` operations between objects of incompatible dtypes,
the result will be a base :class:`Index` of dtype ``object``. This behavior holds true for
unions between :class:`Index` objects that previously would have been prohibited. The dtype
of empty :class:`Index` objects will now be evaluated before performing union operations
rather than simply returning the other :class:`Index` object. :func:`Index.union` can now be
considered commutative, such that ``A.union(B) == B.union(A)`` (:issue:`23525`).

*Previous behavior*:

.. code-block:: python

    In [1]: pd.period_range('19910905', periods=2).union(pd.Int64Index([1, 2, 3]))
    ...
    ValueError: can only call with other PeriodIndex-ed objects

    In [2]: pd.Index([], dtype=object).union(pd.Index([1, 2, 3]))
    Out[2]: Int64Index([1, 2, 3], dtype='int64')

*New behavior*:

.. code-block:: python

    In [3]: pd.period_range('19910905', periods=2).union(pd.Int64Index([1, 2, 3]))
    Out[3]: Index([1991-09-05, 1991-09-06, 1, 2, 3], dtype='object')
    In [4]: pd.Index([], dtype=object).union(pd.Index([1, 2, 3]))
    Out[4]: Index([1, 2, 3], dtype='object')

Note that integer- and floating-dtype indexes are considered "compatible". The integer
values are coerced to floating point, which may result in loss of precision. See
:ref:`indexing.set_ops` for more.


``DataFrame`` GroupBy ffill/bfill no longer return group labels
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The methods ``ffill``, ``bfill``, ``pad`` and ``backfill`` of
:class:`.DataFrameGroupBy`
previously included the group labels in the return value, which was
inconsistent with other groupby transforms. Now only the filled values
are returned. (:issue:`21521`)

.. ipython:: python

    df = pd.DataFrame({"a": ["x", "y"], "b": [1, 2]})
    df

*Previous behavior*:

.. code-block:: python

   In [3]: df.groupby("a").ffill()
   Out[3]:
      a  b
   0  x  1
   1  y  2

*New behavior*:

.. ipython:: python

    df.groupby("a").ffill()

``DataFrame`` describe on an empty Categorical / object column will return top and freq
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When calling :meth:`DataFrame.describe` with an empty categorical / object
column, the 'top' and 'freq' columns were previously omitted, which was inconsistent with
the output for non-empty columns. Now the 'top' and 'freq' columns will always be included,
with :attr:`numpy.nan` in the case of an empty :class:`DataFrame` (:issue:`26397`)

.. ipython:: python

   df = pd.DataFrame({"empty_col": pd.Categorical([])})
   df

*Previous behavior*:

.. code-block:: python

   In [3]: df.describe()
   Out[3]:
           empty_col
   count           0
   unique          0

*New behavior*:

.. ipython:: python

   df.describe()

``__str__`` methods now call ``__repr__`` rather than vice versa
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

pandas has until now mostly defined string representations in a pandas objects'
``__str__``/``__unicode__``/``__bytes__`` methods, and called ``__str__`` from the ``__repr__``
method, if a specific ``__repr__`` method is not found. This is not needed for Python3.
In pandas 0.25, the string representations of pandas objects are now generally
defined in ``__repr__``, and calls to ``__str__`` in general now pass the call on to
the ``__repr__``, if a specific ``__str__`` method doesn't exist, as is standard for Python.
This change is backward compatible for direct usage of pandas, but if you subclass
pandas objects *and* give your subclasses specific ``__str__``/``__repr__`` methods,
you may have to adjust your ``__str__``/``__repr__`` methods (:issue:`26495`).

.. _whatsnew_0250.api_breaking.interval_indexing:


Indexing an ``IntervalIndex`` with ``Interval`` objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Indexing methods for :class:`IntervalIndex` have been modified to require exact matches only for :class:`Interval` queries.
``IntervalIndex`` methods previously matched on any overlapping ``Interval``.  Behavior with scalar points, e.g. querying
with an integer, is unchanged (:issue:`16316`).

.. ipython:: python

   ii = pd.IntervalIndex.from_tuples([(0, 4), (1, 5), (5, 8)])
   ii

The ``in`` operator (``__contains__``) now only returns ``True`` for exact matches to ``Intervals`` in the ``IntervalIndex``, whereas
this would previously return ``True`` for any ``Interval`` overlapping an ``Interval`` in the ``IntervalIndex``.

*Previous behavior*:

.. code-block:: python

   In [4]: pd.Interval(1, 2, closed='neither') in ii
   Out[4]: True

   In [5]: pd.Interval(-10, 10, closed='both') in ii
   Out[5]: True

*New behavior*:

.. ipython:: python

   pd.Interval(1, 2, closed='neither') in ii
   pd.Interval(-10, 10, closed='both') in ii

The :meth:`~IntervalIndex.get_loc` method now only returns locations for exact matches to ``Interval`` queries, as opposed to the previous behavior of
returning locations for overlapping matches.  A ``KeyError`` will be raised if an exact match is not found.

*Previous behavior*:

.. code-block:: python

   In [6]: ii.get_loc(pd.Interval(1, 5))
   Out[6]: array([0, 1])

   In [7]: ii.get_loc(pd.Interval(2, 6))
   Out[7]: array([0, 1, 2])

*New behavior*:

.. code-block:: python

   In [6]: ii.get_loc(pd.Interval(1, 5))
   Out[6]: 1

   In [7]: ii.get_loc(pd.Interval(2, 6))
   ---------------------------------------------------------------------------
   KeyError: Interval(2, 6, closed='right')

Likewise, :meth:`~IntervalIndex.get_indexer` and :meth:`~IntervalIndex.get_indexer_non_unique` will also only return locations for exact matches
to ``Interval`` queries, with ``-1`` denoting that an exact match was not found.

These indexing changes extend to querying a :class:`Series` or :class:`DataFrame` with an ``IntervalIndex`` index.

.. ipython:: python

   s = pd.Series(list('abc'), index=ii)
   s

Selecting from a ``Series`` or ``DataFrame`` using ``[]`` (``__getitem__``) or ``loc`` now only returns exact matches for ``Interval`` queries.

*Previous behavior*:

.. code-block:: python

   In [8]: s[pd.Interval(1, 5)]
   Out[8]:
   (0, 4]    a
   (1, 5]    b
   dtype: object

   In [9]: s.loc[pd.Interval(1, 5)]
   Out[9]:
   (0, 4]    a
   (1, 5]    b
   dtype: object

*New behavior*:

.. ipython:: python

   s[pd.Interval(1, 5)]
   s.loc[pd.Interval(1, 5)]

Similarly, a ``KeyError`` will be raised for non-exact matches instead of returning overlapping matches.

*Previous behavior*:

.. code-block:: python

   In [9]: s[pd.Interval(2, 3)]
   Out[9]:
   (0, 4]    a
   (1, 5]    b
   dtype: object

   In [10]: s.loc[pd.Interval(2, 3)]
   Out[10]:
   (0, 4]    a
   (1, 5]    b
   dtype: object

*New behavior*:

.. code-block:: python

   In [6]: s[pd.Interval(2, 3)]
   ---------------------------------------------------------------------------
   KeyError: Interval(2, 3, closed='right')

   In [7]: s.loc[pd.Interval(2, 3)]
   ---------------------------------------------------------------------------
   KeyError: Interval(2, 3, closed='right')

The :meth:`~IntervalIndex.overlaps` method can be used to create a boolean indexer that replicates the
previous behavior of returning overlapping matches.

*New behavior*:

.. ipython:: python

   idxr = s.index.overlaps(pd.Interval(2, 3))
   idxr
   s[idxr]
   s.loc[idxr]


.. _whatsnew_0250.api_breaking.ufunc:

Binary ufuncs on Series now align
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Applying a binary ufunc like :func:`numpy.power` now aligns the inputs
when both are :class:`Series` (:issue:`23293`).

.. ipython:: python

   s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
   s2 = pd.Series([3, 4, 5], index=['d', 'c', 'b'])
   s1
   s2

*Previous behavior*

.. code-block:: ipython

   In [5]: np.power(s1, s2)
   Out[5]:
   a      1
   b     16
   c    243
   dtype: int64

*New behavior*

.. ipython:: python

   np.power(s1, s2)

This matches the behavior of other binary operations in pandas, like :meth:`Series.add`.
To retain the previous behavior, convert the other ``Series`` to an array before
applying the ufunc.

.. ipython:: python

   np.power(s1, s2.array)

Categorical.argsort now places missing values at the end
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:`Categorical.argsort` now places missing values at the end of the array, making it
consistent with NumPy and the rest of pandas (:issue:`21801`).

.. ipython:: python

   cat = pd.Categorical(['b', None, 'a'], categories=['a', 'b'], ordered=True)

*Previous behavior*

.. code-block:: ipython

   In [2]: cat = pd.Categorical(['b', None, 'a'], categories=['a', 'b'], ordered=True)

   In [3]: cat.argsort()
   Out[3]: array([1, 2, 0])

   In [4]: cat[cat.argsort()]
   Out[4]:
   [NaN, a, b]
   categories (2, object): [a < b]

*New behavior*

.. ipython:: python

   cat.argsort()
   cat[cat.argsort()]

.. _whatsnew_0250.api_breaking.list_of_dict:

Column order is preserved when passing a list of dicts to DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Starting with Python 3.7 the key-order of ``dict`` is `guaranteed <https://mail.python.org/pipermail/python-dev/2017-December/151283.html>`_. In practice, this has been true since
Python 3.6. The :class:`DataFrame` constructor now treats a list of dicts in the same way as
it does a list of ``OrderedDict``, i.e. preserving the order of the dicts.
This change applies only when pandas is running on Python>=3.6 (:issue:`27309`).

.. ipython:: python

   data = [
       {'name': 'Joe', 'state': 'NY', 'age': 18},
       {'name': 'Jane', 'state': 'KY', 'age': 19, 'hobby': 'Minecraft'},
       {'name': 'Jean', 'state': 'OK', 'age': 20, 'finances': 'good'}
   ]

*Previous Behavior*:

The columns were lexicographically sorted previously,

.. code-block:: python

   In [1]: pd.DataFrame(data)
   Out[1]:
      age finances      hobby  name state
   0   18      NaN        NaN   Joe    NY
   1   19      NaN  Minecraft  Jane    KY
   2   20     good        NaN  Jean    OK

*New Behavior*:

The column order now matches the insertion-order of the keys in the ``dict``,
considering all the records from top to bottom. As a consequence, the column
order of the resulting DataFrame has changed compared to previous pandas versions.

.. ipython:: python

   pd.DataFrame(data)

.. _whatsnew_0250.api_breaking.deps:

Increased minimum versions for dependencies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Due to dropping support for Python 2.7, a number of optional dependencies have updated minimum versions (:issue:`25725`, :issue:`24942`, :issue:`25752`).
Independently, some minimum supported versions of dependencies were updated (:issue:`23519`, :issue:`25554`).
If installed, we now require:

+-----------------+-----------------+----------+
| Package         | Minimum Version | Required |
+=================+=================+==========+
| numpy           | 1.13.3          |    X     |
+-----------------+-----------------+----------+
| pytz            | 2015.4          |    X     |
+-----------------+-----------------+----------+
| python-dateutil | 2.6.1           |    X     |
+-----------------+-----------------+----------+
| bottleneck      | 1.2.1           |          |
+-----------------+-----------------+----------+
| numexpr         | 2.6.2           |          |
+-----------------+-----------------+----------+
| pytest (dev)    | 4.0.2           |          |
+-----------------+-----------------+----------+

For `optional libraries <https://pandas.pydata.org/docs/getting_started/install.html>`_ the general recommendation is to use the latest version.
The following table lists the lowest version per library that is currently being tested throughout the development of pandas.
Optional libraries below the lowest tested version may still work, but are not considered supported.

+-----------------+-----------------+
| Package         | Minimum Version |
+=================+=================+
| beautifulsoup4  | 4.6.0           |
+-----------------+-----------------+
| fastparquet     | 0.2.1           |
+-----------------+-----------------+
| gcsfs           | 0.2.2           |
+-----------------+-----------------+
| lxml            | 3.8.0           |
+-----------------+-----------------+
| matplotlib      | 2.2.2           |
+-----------------+-----------------+
| openpyxl        | 2.4.8           |
+-----------------+-----------------+
| pyarrow         | 0.9.0           |
+-----------------+-----------------+
| pymysql         | 0.7.1           |
+-----------------+-----------------+
| pytables        | 3.4.2           |
+-----------------+-----------------+
| scipy           | 0.19.0          |
+-----------------+-----------------+
| sqlalchemy      | 1.1.4           |
+-----------------+-----------------+
| xarray          | 0.8.2           |
+-----------------+-----------------+
| xlrd            | 1.1.0           |
+-----------------+-----------------+
| xlsxwriter      | 0.9.8           |
+-----------------+-----------------+
| xlwt            | 1.2.0           |
+-----------------+-----------------+

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

.. _whatsnew_0250.api.other:

Other API changes
^^^^^^^^^^^^^^^^^

- :class:`DatetimeTZDtype` will now standardize pytz timezones to a common timezone instance (:issue:`24713`)
- :class:`Timestamp` and :class:`Timedelta` scalars now implement the :meth:`to_numpy` method as aliases to :meth:`Timestamp.to_datetime64` and :meth:`Timedelta.to_timedelta64`, respectively. (:issue:`24653`)
- :meth:`Timestamp.strptime` will now rise a ``NotImplementedError`` (:issue:`25016`)
- Comparing :class:`Timestamp` with unsupported objects now returns :py:obj:`NotImplemented` instead of raising ``TypeError``. This implies that unsupported rich comparisons are delegated to the other object, and are now consistent with Python 3 behavior for ``datetime`` objects (:issue:`24011`)
- Bug in :meth:`DatetimeIndex.snap` which didn't preserving the ``name`` of the input :class:`Index` (:issue:`25575`)
- The ``arg`` argument in :meth:`.DataFrameGroupBy.agg` has been renamed to ``func`` (:issue:`26089`)
- The ``arg`` argument in :meth:`.Window.aggregate` has been renamed to ``func`` (:issue:`26372`)
- Most pandas classes had a ``__bytes__`` method, which was used for getting a python2-style bytestring representation of the object. This method has been removed as a part of dropping Python2 (:issue:`26447`)
- The ``.str``-accessor has been disabled for 1-level :class:`MultiIndex`, use :meth:`MultiIndex.to_flat_index` if necessary (:issue:`23679`)
- Removed support of gtk package for clipboards (:issue:`26563`)
- Using an unsupported version of Beautiful Soup 4 will now raise an ``ImportError`` instead of a ``ValueError`` (:issue:`27063`)
- :meth:`Series.to_excel` and :meth:`DataFrame.to_excel` will now raise a ``ValueError`` when saving timezone aware data. (:issue:`27008`, :issue:`7056`)
- :meth:`ExtensionArray.argsort` places NA values at the end of the sorted array. (:issue:`21801`)
- :meth:`DataFrame.to_hdf` and :meth:`Series.to_hdf` will now raise a ``NotImplementedError`` when saving a :class:`MultiIndex` with extension data types for a ``fixed`` format. (:issue:`7775`)
- Passing duplicate ``names`` in :meth:`read_csv` will now raise a ``ValueError`` (:issue:`17346`)

.. _whatsnew_0250.deprecations:

Deprecations
~~~~~~~~~~~~

Sparse subclasses
^^^^^^^^^^^^^^^^^

The ``SparseSeries`` and ``SparseDataFrame`` subclasses are deprecated. Their functionality is better-provided
by a ``Series`` or ``DataFrame`` with sparse values.

**Previous way**

.. code-block:: python

   df = pd.SparseDataFrame({"A": [0, 0, 1, 2]})
   df.dtypes

**New way**

.. ipython:: python

   df = pd.DataFrame({"A": pd.arrays.SparseArray([0, 0, 1, 2])})
   df.dtypes

The memory usage of the two approaches is identical (:issue:`19239`).

msgpack format
^^^^^^^^^^^^^^

The msgpack format is deprecated as of 0.25 and will be removed in a future version. It is recommended to use pyarrow for on-the-wire transmission of pandas objects. (:issue:`27084`)


Other deprecations
^^^^^^^^^^^^^^^^^^

- The deprecated ``.ix[]`` indexer now raises a more visible ``FutureWarning`` instead of ``DeprecationWarning`` (:issue:`26438`).
- Deprecated the ``units=M`` (months) and ``units=Y`` (year) parameters for ``units`` of :func:`pandas.to_timedelta`, :func:`pandas.Timedelta` and :func:`pandas.TimedeltaIndex` (:issue:`16344`)
- :meth:`pandas.concat` has deprecated the ``join_axes``-keyword. Instead, use :meth:`DataFrame.reindex` or :meth:`DataFrame.reindex_like` on the result or on the inputs (:issue:`21951`)
- The :attr:`SparseArray.values` attribute is deprecated. You can use ``np.asarray(...)`` or
  the :meth:`SparseArray.to_dense` method instead (:issue:`26421`).
- The functions :func:`pandas.to_datetime` and :func:`pandas.to_timedelta` have deprecated the ``box`` keyword. Instead, use :meth:`to_numpy` or :meth:`Timestamp.to_datetime64` or :meth:`Timedelta.to_timedelta64`. (:issue:`24416`)
- The :meth:`DataFrame.compound` and :meth:`Series.compound` methods are deprecated and will be removed in a future version (:issue:`26405`).
- The internal attributes ``_start``, ``_stop`` and ``_step`` attributes of :class:`RangeIndex` have been deprecated.
  Use the public attributes :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop` and :attr:`~RangeIndex.step` instead (:issue:`26581`).
- The :meth:`Series.ftype`, :meth:`Series.ftypes` and :meth:`DataFrame.ftypes` methods are deprecated and will be removed in a future version.
  Instead, use :meth:`Series.dtype` and :meth:`DataFrame.dtypes` (:issue:`26705`).
- The :meth:`Series.get_values`, :meth:`DataFrame.get_values`, :meth:`Index.get_values`,
  :meth:`SparseArray.get_values` and :meth:`Categorical.get_values` methods are deprecated.
  One of ``np.asarray(..)`` or :meth:`~Series.to_numpy` can be used instead (:issue:`19617`).
- The 'outer' method on NumPy ufuncs, e.g. ``np.subtract.outer`` has been deprecated on :class:`Series` objects. Convert the input to an array with :attr:`Series.array` first (:issue:`27186`)
- :meth:`Timedelta.resolution` is deprecated and replaced with :meth:`Timedelta.resolution_string`.  In a future version, :meth:`Timedelta.resolution` will be changed to behave like the standard library :attr:`datetime.timedelta.resolution` (:issue:`21344`)
- :func:`read_table` has been undeprecated. (:issue:`25220`)
- :attr:`Index.dtype_str` is deprecated. (:issue:`18262`)
- :attr:`Series.imag` and :attr:`Series.real` are deprecated. (:issue:`18262`)
- :meth:`Series.put` is deprecated. (:issue:`18262`)
- :meth:`Index.item` and :meth:`Series.item` is deprecated. (:issue:`18262`)
- The default value ``ordered=None`` in :class:`~pandas.api.types.CategoricalDtype` has been deprecated in favor of ``ordered=False``. When converting between categorical types ``ordered=True`` must be explicitly passed in order to be preserved. (:issue:`26336`)
- :meth:`Index.contains` is deprecated. Use ``key in index`` (``__contains__``) instead (:issue:`17753`).
- :meth:`DataFrame.get_dtype_counts` is deprecated. (:issue:`18262`)
- :meth:`Categorical.ravel` will return a :class:`Categorical` instead of a ``np.ndarray`` (:issue:`27199`)


.. _whatsnew_0250.prior_deprecations:

Removal of prior version deprecations/changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Removed ``Panel`` (:issue:`25047`, :issue:`25191`, :issue:`25231`)
- Removed the previously deprecated ``sheetname`` keyword in :func:`read_excel` (:issue:`16442`, :issue:`20938`)
- Removed the previously deprecated ``TimeGrouper`` (:issue:`16942`)
- Removed the previously deprecated ``parse_cols`` keyword in :func:`read_excel` (:issue:`16488`)
- Removed the previously deprecated ``pd.options.html.border`` (:issue:`16970`)
- Removed the previously deprecated ``convert_objects`` (:issue:`11221`)
- Removed the previously deprecated ``select`` method of ``DataFrame`` and ``Series`` (:issue:`17633`)
- Removed the previously deprecated behavior of :class:`Series` treated as list-like in :meth:`~Series.cat.rename_categories` (:issue:`17982`)
- Removed the previously deprecated ``DataFrame.reindex_axis`` and ``Series.reindex_axis`` (:issue:`17842`)
- Removed the previously deprecated behavior of altering column or index labels with :meth:`Series.rename_axis` or :meth:`DataFrame.rename_axis` (:issue:`17842`)
- Removed the previously deprecated ``tupleize_cols`` keyword argument in :meth:`read_html`, :meth:`read_csv`, and :meth:`DataFrame.to_csv` (:issue:`17877`, :issue:`17820`)
- Removed the previously deprecated ``DataFrame.from.csv`` and ``Series.from_csv`` (:issue:`17812`)
- Removed the previously deprecated ``raise_on_error`` keyword argument in :meth:`DataFrame.where` and :meth:`DataFrame.mask` (:issue:`17744`)
- Removed the previously deprecated ``ordered`` and ``categories`` keyword arguments in ``astype`` (:issue:`17742`)
- Removed the previously deprecated ``cdate_range`` (:issue:`17691`)
- Removed the previously deprecated ``True`` option for the ``dropna`` keyword argument in :func:`SeriesGroupBy.nth` (:issue:`17493`)
- Removed the previously deprecated ``convert`` keyword argument in :meth:`Series.take` and :meth:`DataFrame.take` (:issue:`17352`)
- Removed the previously deprecated behavior of arithmetic operations with ``datetime.date`` objects (:issue:`21152`)

.. _whatsnew_0250.performance:

Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Significant speedup in :class:`SparseArray` initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`)
- :meth:`DataFrame.to_stata()` is now faster when outputting data with any string or non-native endian columns (:issue:`25045`)
- Improved performance of :meth:`Series.searchsorted`. The speedup is especially large when the dtype is
  int8/int16/int32 and the searched key is within the integer bounds for the dtype (:issue:`22034`)
- Improved performance of :meth:`.GroupBy.quantile` (:issue:`20405`)
- Improved performance of slicing and other selected operation on a :class:`RangeIndex` (:issue:`26565`, :issue:`26617`, :issue:`26722`)
- :class:`RangeIndex` now performs standard lookup without instantiating an actual hashtable, hence saving memory (:issue:`16685`)
- Improved performance of :meth:`read_csv` by faster tokenizing and faster parsing of small float numbers (:issue:`25784`)
- Improved performance of :meth:`read_csv` by faster parsing of N/A and boolean values (:issue:`25804`)
- Improved performance of :attr:`IntervalIndex.is_monotonic`, :attr:`IntervalIndex.is_monotonic_increasing` and :attr:`IntervalIndex.is_monotonic_decreasing` by removing conversion to :class:`MultiIndex` (:issue:`24813`)
- Improved performance of :meth:`DataFrame.to_csv` when writing datetime dtypes (:issue:`25708`)
- Improved performance of :meth:`read_csv` by much faster parsing of ``MM/YYYY`` and ``DD/MM/YYYY`` datetime formats (:issue:`25922`)
- Improved performance of nanops for dtypes that cannot store NaNs. Speedup is particularly prominent for :meth:`Series.all` and :meth:`Series.any` (:issue:`25070`)
- Improved performance of :meth:`Series.map` for dictionary mappers on categorical series by mapping the categories instead of mapping all values (:issue:`23785`)
- Improved performance of :meth:`IntervalIndex.intersection` (:issue:`24813`)
- Improved performance of :meth:`read_csv` by faster concatenating date columns without extra conversion to string for integer/float zero and float ``NaN``; by faster checking the string for the possibility of being a date (:issue:`25754`)
- Improved performance of :attr:`IntervalIndex.is_unique` by removing conversion to ``MultiIndex`` (:issue:`24813`)
- Restored performance of :meth:`DatetimeIndex.__iter__` by re-enabling specialized code path (:issue:`26702`)
- Improved performance when building :class:`MultiIndex` with at least one :class:`CategoricalIndex` level (:issue:`22044`)
- Improved performance by removing the need for a garbage collect when checking for ``SettingWithCopyWarning`` (:issue:`27031`)
- For :meth:`to_datetime` changed default value of cache parameter to ``True`` (:issue:`26043`)
- Improved performance of :class:`DatetimeIndex` and :class:`PeriodIndex` slicing given non-unique, monotonic data (:issue:`27136`).
- Improved performance of :meth:`pd.read_json` for index-oriented data. (:issue:`26773`)
- Improved performance of :meth:`MultiIndex.shape` (:issue:`27384`).

.. _whatsnew_0250.bug_fixes:

Bug fixes
~~~~~~~~~


Categorical
^^^^^^^^^^^

- Bug in :func:`DataFrame.at` and :func:`Series.at` that would raise exception if the index was a :class:`CategoricalIndex` (:issue:`20629`)
- Fixed bug in comparison of ordered :class:`Categorical` that contained missing values with a scalar which sometimes incorrectly resulted in ``True`` (:issue:`26504`)
- Bug in :meth:`DataFrame.dropna` when the :class:`DataFrame` has a :class:`CategoricalIndex` containing :class:`Interval` objects incorrectly raised a ``TypeError`` (:issue:`25087`)

Datetimelike
^^^^^^^^^^^^

- Bug in :func:`to_datetime` which would raise an (incorrect) ``ValueError`` when called with a date far into the future and the ``format`` argument specified instead of raising ``OutOfBoundsDatetime`` (:issue:`23830`)
- Bug in :func:`to_datetime` which would raise ``InvalidIndexError: Reindexing only valid with uniquely valued Index objects`` when called with ``cache=True``, with ``arg`` including at least two different elements from the set ``{None, numpy.nan, pandas.NaT}`` (:issue:`22305`)
- Bug in :class:`DataFrame` and :class:`Series` where timezone aware data with ``dtype='datetime64[ns]`` was not cast to naive (:issue:`25843`)
- Improved :class:`Timestamp` type checking in various datetime functions to prevent exceptions when using a subclassed ``datetime`` (:issue:`25851`)
- Bug in :class:`Series` and :class:`DataFrame` repr where ``np.datetime64('NaT')`` and ``np.timedelta64('NaT')`` with ``dtype=object`` would be represented as ``NaN`` (:issue:`25445`)
- Bug in :func:`to_datetime` which does not replace the invalid argument with ``NaT`` when error is set to coerce (:issue:`26122`)
- Bug in adding :class:`DateOffset` with nonzero month to :class:`DatetimeIndex` would raise ``ValueError`` (:issue:`26258`)
- Bug in :func:`to_datetime` which raises unhandled ``OverflowError`` when called with mix of invalid dates and ``NaN`` values with ``format='%Y%m%d'`` and ``error='coerce'`` (:issue:`25512`)
- Bug in :meth:`isin` for datetimelike indexes; :class:`DatetimeIndex`, :class:`TimedeltaIndex` and :class:`PeriodIndex` where the ``levels`` parameter was ignored. (:issue:`26675`)
- Bug in :func:`to_datetime` which raises ``TypeError`` for ``format='%Y%m%d'`` when called for invalid integer dates with length >= 6 digits with ``errors='ignore'``
- Bug when comparing a :class:`PeriodIndex` against a zero-dimensional numpy array (:issue:`26689`)
- Bug in constructing a ``Series`` or ``DataFrame`` from a numpy ``datetime64`` array with a non-ns unit and out-of-bound timestamps generating rubbish data, which will now correctly raise an ``OutOfBoundsDatetime`` error (:issue:`26206`).
- Bug in :func:`date_range` with unnecessary ``OverflowError`` being raised for very large or very small dates (:issue:`26651`)
- Bug where adding :class:`Timestamp` to a ``np.timedelta64`` object would raise instead of returning a :class:`Timestamp` (:issue:`24775`)
- Bug where comparing a zero-dimensional numpy array containing a ``np.datetime64`` object to a :class:`Timestamp` would incorrect raise ``TypeError`` (:issue:`26916`)
- Bug in :func:`to_datetime` which would raise ``ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True`` when called with ``cache=True``, with ``arg`` including datetime strings with different offset (:issue:`26097`)
-

Timedelta
^^^^^^^^^

- Bug in :func:`TimedeltaIndex.intersection` where for non-monotonic indices in some cases an empty ``Index`` was returned when in fact an intersection existed (:issue:`25913`)
- Bug with comparisons between :class:`Timedelta` and ``NaT`` raising ``TypeError`` (:issue:`26039`)
- Bug when adding or subtracting a :class:`BusinessHour` to a :class:`Timestamp` with the resulting time landing in a following or prior day respectively (:issue:`26381`)
- Bug when comparing a :class:`TimedeltaIndex` against a zero-dimensional numpy array (:issue:`26689`)

Timezones
^^^^^^^^^

- Bug in :func:`DatetimeIndex.to_frame` where timezone aware data would be converted to timezone naive data (:issue:`25809`)
- Bug in :func:`to_datetime` with ``utc=True`` and datetime strings that would apply previously parsed UTC offsets to subsequent arguments (:issue:`24992`)
- Bug in :func:`Timestamp.tz_localize` and :func:`Timestamp.tz_convert` does not propagate ``freq`` (:issue:`25241`)
- Bug in :func:`Series.at` where setting :class:`Timestamp` with timezone raises ``TypeError`` (:issue:`25506`)
- Bug in :func:`DataFrame.update` when updating with timezone aware data would return timezone naive data (:issue:`25807`)
- Bug in :func:`to_datetime` where an uninformative ``RuntimeError`` was raised when passing a naive :class:`Timestamp` with datetime strings with mixed UTC offsets (:issue:`25978`)
- Bug in :func:`to_datetime` with ``unit='ns'`` would drop timezone information from the parsed argument (:issue:`26168`)
- Bug in :func:`DataFrame.join` where joining a timezone aware index with a timezone aware column would result in a column of ``NaN`` (:issue:`26335`)
- Bug in :func:`date_range` where ambiguous or nonexistent start or end times were not handled by the ``ambiguous`` or ``nonexistent`` keywords respectively (:issue:`27088`)
- Bug in :meth:`DatetimeIndex.union` when combining a timezone aware and timezone unaware :class:`DatetimeIndex` (:issue:`21671`)
- Bug when applying a numpy reduction function (e.g. :meth:`numpy.minimum`) to a timezone aware :class:`Series` (:issue:`15552`)

Numeric
^^^^^^^

- Bug in :meth:`to_numeric` in which large negative numbers were being improperly handled (:issue:`24910`)
- Bug in :meth:`to_numeric` in which numbers were being coerced to float, even though ``errors`` was not ``coerce`` (:issue:`24910`)
- Bug in :meth:`to_numeric` in which invalid values for ``errors`` were being allowed (:issue:`26466`)
- Bug in :class:`format` in which floating point complex numbers were not being formatted to proper display precision and trimming (:issue:`25514`)
- Bug in error messages in :meth:`DataFrame.corr` and :meth:`Series.corr`. Added the possibility of using a callable. (:issue:`25729`)
- Bug in :meth:`Series.divmod` and :meth:`Series.rdivmod` which would raise an (incorrect) ``ValueError`` rather than return a pair of :class:`Series` objects as result (:issue:`25557`)
- Raises a helpful exception when a non-numeric index is sent to :meth:`interpolate` with methods which require numeric index. (:issue:`21662`)
- Bug in :meth:`~pandas.eval` when comparing floats with scalar operators, for example: ``x < -0.1`` (:issue:`25928`)
- Fixed bug where casting all-boolean array to integer extension array failed (:issue:`25211`)
- Bug in ``divmod`` with a :class:`Series` object containing zeros incorrectly raising ``AttributeError`` (:issue:`26987`)
- Inconsistency in :class:`Series` floor-division (`//`) and ``divmod`` filling positive//zero with ``NaN`` instead of ``Inf`` (:issue:`27321`)
-

Conversion
^^^^^^^^^^

- Bug in :func:`DataFrame.astype()` when passing a dict of columns and types the ``errors`` parameter was ignored. (:issue:`25905`)
-

Strings
^^^^^^^

- Bug in the ``__name__`` attribute of several methods of :class:`Series.str`, which were set incorrectly (:issue:`23551`)
- Improved error message when passing :class:`Series` of wrong dtype to :meth:`Series.str.cat` (:issue:`22722`)
-


Interval
^^^^^^^^

- Construction of :class:`Interval` is restricted to numeric, :class:`Timestamp` and :class:`Timedelta` endpoints (:issue:`23013`)
- Fixed bug in :class:`Series`/:class:`DataFrame` not displaying ``NaN`` in :class:`IntervalIndex` with missing values (:issue:`25984`)
- Bug in :meth:`IntervalIndex.get_loc` where a ``KeyError`` would be incorrectly raised for a decreasing :class:`IntervalIndex` (:issue:`25860`)
- Bug in :class:`Index` constructor where passing mixed closed :class:`Interval` objects would result in a ``ValueError`` instead of an ``object`` dtype ``Index`` (:issue:`27172`)

Indexing
^^^^^^^^

- Improved exception message when calling :meth:`DataFrame.iloc` with a list of non-numeric objects (:issue:`25753`).
- Improved exception message when calling ``.iloc`` or ``.loc`` with a boolean indexer with different length (:issue:`26658`).
- Bug in ``KeyError`` exception message when indexing a :class:`MultiIndex` with a non-existent key not displaying the original key (:issue:`27250`).
- Bug in ``.iloc`` and ``.loc`` with a boolean indexer not raising an ``IndexError`` when too few items are passed (:issue:`26658`).
- Bug in :meth:`DataFrame.loc` and :meth:`Series.loc` where ``KeyError`` was not raised for a ``MultiIndex`` when the key was less than or equal to the number of levels in the :class:`MultiIndex` (:issue:`14885`).
- Bug in which :meth:`DataFrame.append` produced an erroneous warning indicating that a ``KeyError`` will be thrown in the future when the data to be appended contains new columns (:issue:`22252`).
- Bug in which :meth:`DataFrame.to_csv` caused a segfault for a reindexed data frame, when the indices were single-level :class:`MultiIndex` (:issue:`26303`).
- Fixed bug where assigning a :class:`arrays.PandasArray` to a :class:`.DataFrame` would raise error (:issue:`26390`)
- Allow keyword arguments for callable local reference used in the :meth:`DataFrame.query` string (:issue:`26426`)
- Fixed a ``KeyError`` when indexing a :class:`MultiIndex` level with a list containing exactly one label, which is missing (:issue:`27148`)
- Bug which produced ``AttributeError`` on partial matching :class:`Timestamp` in a :class:`MultiIndex`  (:issue:`26944`)
- Bug in :class:`Categorical` and  :class:`CategoricalIndex` with :class:`Interval` values when using the ``in`` operator (``__contains``) with objects that are not comparable to the values in the ``Interval`` (:issue:`23705`)
- Bug in :meth:`DataFrame.loc` and :meth:`DataFrame.iloc` on a :class:`DataFrame` with a single timezone-aware datetime64[ns] column incorrectly returning a scalar instead of a :class:`Series` (:issue:`27110`)
- Bug in :class:`CategoricalIndex` and :class:`Categorical` incorrectly raising ``ValueError`` instead of ``TypeError`` when a list is passed using the ``in`` operator (``__contains__``) (:issue:`21729`)
- Bug in setting a new value in a :class:`Series` with a :class:`Timedelta` object incorrectly casting the value to an integer (:issue:`22717`)
- Bug in :class:`Series` setting a new key (``__setitem__``) with a timezone-aware datetime incorrectly raising ``ValueError`` (:issue:`12862`)
- Bug in :meth:`DataFrame.iloc` when indexing with a read-only indexer (:issue:`17192`)
- Bug in :class:`Series` setting an existing tuple key (``__setitem__``) with timezone-aware datetime values incorrectly raising ``TypeError`` (:issue:`20441`)

Missing
^^^^^^^

- Fixed misleading exception message in :meth:`Series.interpolate` if argument ``order`` is required, but omitted (:issue:`10633`, :issue:`24014`).
- Fixed class type displayed in exception message in :meth:`DataFrame.dropna` if invalid ``axis`` parameter passed (:issue:`25555`)
- A ``ValueError`` will now be thrown by :meth:`DataFrame.fillna` when ``limit`` is not a positive integer (:issue:`27042`)
-

MultiIndex
^^^^^^^^^^

- Bug in which incorrect exception raised by :class:`Timedelta` when testing the membership of :class:`MultiIndex` (:issue:`24570`)
-

IO
^^

- Bug in :func:`DataFrame.to_html()` where values were truncated using display options instead of outputting the full content (:issue:`17004`)
- Fixed bug in missing text when using :meth:`to_clipboard` if copying utf-16 characters in Python 3 on Windows (:issue:`25040`)
- Bug in :func:`read_json` for ``orient='table'`` when it tries to infer dtypes by default, which is not applicable as dtypes are already defined in the JSON schema (:issue:`21345`)
- Bug in :func:`read_json` for ``orient='table'`` and float index, as it infers index dtype by default, which is not applicable because index dtype is already defined in the JSON schema (:issue:`25433`)
- Bug in :func:`read_json` for ``orient='table'`` and string of float column names, as it makes a column name type conversion to :class:`Timestamp`, which is not applicable because column names are already defined in the JSON schema (:issue:`25435`)
- Bug in :func:`json_normalize` for ``errors='ignore'`` where missing values in the input data, were filled in resulting ``DataFrame`` with the string ``"nan"`` instead of ``numpy.nan`` (:issue:`25468`)
- :meth:`DataFrame.to_html` now raises ``TypeError`` when using an invalid type for the ``classes`` parameter instead of ``AssertionError`` (:issue:`25608`)
- Bug in :meth:`DataFrame.to_string` and :meth:`DataFrame.to_latex` that would lead to incorrect output when the ``header`` keyword is used (:issue:`16718`)
- Bug in :func:`read_csv` not properly interpreting the UTF8 encoded filenames on Windows on Python 3.6+ (:issue:`15086`)
- Improved performance in :meth:`pandas.read_stata` and :class:`pandas.io.stata.StataReader` when converting columns that have missing values (:issue:`25772`)
- Bug in :meth:`DataFrame.to_html` where header numbers would ignore display options when rounding (:issue:`17280`)
- Bug in :func:`read_hdf` where reading a table from an HDF5 file written directly with PyTables fails with a ``ValueError`` when using a sub-selection via the ``start`` or ``stop`` arguments (:issue:`11188`)
- Bug in :func:`read_hdf` not properly closing store after a ``KeyError`` is raised (:issue:`25766`)
- Improved the explanation for the failure when value labels are repeated in Stata dta files and suggested work-arounds (:issue:`25772`)
- Improved :meth:`pandas.read_stata` and :class:`pandas.io.stata.StataReader` to read incorrectly formatted 118 format files saved by Stata (:issue:`25960`)
- Improved the ``col_space`` parameter in :meth:`DataFrame.to_html` to accept a string so CSS length values can be set correctly (:issue:`25941`)
- Fixed bug in loading objects from S3 that contain ``#`` characters in the URL (:issue:`25945`)
- Adds ``use_bqstorage_api`` parameter to :func:`read_gbq` to speed up downloads of large data frames. This feature requires version 0.10.0 of the ``pandas-gbq`` library as well as the ``google-cloud-bigquery-storage`` and ``fastavro`` libraries. (:issue:`26104`)
- Fixed memory leak in :meth:`DataFrame.to_json` when dealing with numeric data (:issue:`24889`)
- Bug in :func:`read_json` where date strings with ``Z`` were not converted to a UTC timezone (:issue:`26168`)
- Added ``cache_dates=True`` parameter to :meth:`read_csv`, which allows to cache unique dates when they are parsed (:issue:`25990`)
- :meth:`DataFrame.to_excel` now raises a ``ValueError`` when the caller's dimensions exceed the limitations of Excel (:issue:`26051`)
- Fixed bug in :func:`pandas.read_csv` where a BOM would result in incorrect parsing using engine='python' (:issue:`26545`)
- :func:`read_excel` now raises a ``ValueError`` when input is of type :class:`pandas.io.excel.ExcelFile` and ``engine`` param is passed since :class:`pandas.io.excel.ExcelFile` has an engine defined (:issue:`26566`)
- Bug while selecting from :class:`HDFStore` with ``where=''`` specified (:issue:`26610`).
- Fixed bug in :func:`DataFrame.to_excel()` where custom objects (i.e. ``PeriodIndex``) inside merged cells were not being converted into types safe for the Excel writer (:issue:`27006`)
- Bug in :meth:`read_hdf` where reading a timezone aware :class:`DatetimeIndex` would raise a ``TypeError`` (:issue:`11926`)
- Bug in :meth:`to_msgpack` and :meth:`read_msgpack` which would raise a ``ValueError`` rather than a ``FileNotFoundError`` for an invalid path (:issue:`27160`)
- Fixed bug in :meth:`DataFrame.to_parquet` which would raise a ``ValueError`` when the dataframe had no columns (:issue:`27339`)
- Allow parsing of :class:`PeriodDtype` columns when using :func:`read_csv` (:issue:`26934`)

Plotting
^^^^^^^^

- Fixed bug where :class:`api.extensions.ExtensionArray` could not be used in matplotlib plotting (:issue:`25587`)
- Bug in an error message in :meth:`DataFrame.plot`. Improved the error message if non-numerics are passed to :meth:`DataFrame.plot` (:issue:`25481`)
- Bug in incorrect ticklabel positions when plotting an index that are non-numeric / non-datetime (:issue:`7612`, :issue:`15912`, :issue:`22334`)
- Fixed bug causing plots of :class:`PeriodIndex` timeseries to fail if the frequency is a multiple of the frequency rule code (:issue:`14763`)
- Fixed bug when plotting a :class:`DatetimeIndex` with ``datetime.timezone.utc`` timezone (:issue:`17173`)
-

GroupBy/resample/rolling
^^^^^^^^^^^^^^^^^^^^^^^^

- Bug in :meth:`.Resampler.agg` with a timezone aware index where ``OverflowError`` would raise when passing a list of functions (:issue:`22660`)
- Bug in :meth:`.DataFrameGroupBy.nunique` in which the names of column levels were lost (:issue:`23222`)
- Bug in :func:`.GroupBy.agg` when applying an aggregation function to timezone aware data (:issue:`23683`)
- Bug in :func:`.GroupBy.first` and :func:`.GroupBy.last` where timezone information would be dropped (:issue:`21603`)
- Bug in :func:`.GroupBy.size` when grouping only NA values (:issue:`23050`)
- Bug in :func:`Series.groupby` where ``observed`` kwarg was previously ignored (:issue:`24880`)
- Bug in :func:`Series.groupby` where using ``groupby`` with a :class:`MultiIndex` Series with a list of labels equal to the length of the series caused incorrect grouping (:issue:`25704`)
- Ensured that ordering of outputs in ``groupby`` aggregation functions is consistent across all versions of Python (:issue:`25692`)
- Ensured that result group order is correct when grouping on an ordered ``Categorical`` and specifying ``observed=True`` (:issue:`25871`, :issue:`25167`)
- Bug in :meth:`.Rolling.min` and :meth:`.Rolling.max` that caused a memory leak (:issue:`25893`)
- Bug in :meth:`.Rolling.count` and ``.Expanding.count`` was previously ignoring the ``axis`` keyword (:issue:`13503`)
- Bug in :meth:`.GroupBy.idxmax` and :meth:`.GroupBy.idxmin` with datetime column would return incorrect dtype (:issue:`25444`, :issue:`15306`)
- Bug in :meth:`.GroupBy.cumsum`, :meth:`.GroupBy.cumprod`, :meth:`.GroupBy.cummin` and :meth:`.GroupBy.cummax` with categorical column having absent categories, would return incorrect result or segfault (:issue:`16771`)
- Bug in :meth:`.GroupBy.nth` where NA values in the grouping would return incorrect results (:issue:`26011`)
- Bug in :meth:`.SeriesGroupBy.transform` where transforming an empty group would raise a ``ValueError`` (:issue:`26208`)
- Bug in :meth:`.DataFrame.groupby` where passing a :class:`.Grouper` would return incorrect groups when using the ``.groups`` accessor (:issue:`26326`)
- Bug in :meth:`.GroupBy.agg` where incorrect results are returned for uint64 columns. (:issue:`26310`)
- Bug in :meth:`.Rolling.median` and :meth:`.Rolling.quantile` where MemoryError is raised with empty window (:issue:`26005`)
- Bug in :meth:`.Rolling.median` and :meth:`.Rolling.quantile` where incorrect results are returned with ``closed='left'`` and ``closed='neither'`` (:issue:`26005`)
- Improved :class:`.Rolling`, :class:`.Window` and :class:`.ExponentialMovingWindow` functions to exclude nuisance columns from results instead of raising errors and raise a ``DataError`` only if all columns are nuisance (:issue:`12537`)
- Bug in :meth:`.Rolling.max` and :meth:`.Rolling.min` where incorrect results are returned with an empty variable window (:issue:`26005`)
- Raise a helpful exception when an unsupported weighted window function is used as an argument of :meth:`.Window.aggregate` (:issue:`26597`)

Reshaping
^^^^^^^^^

- Bug in :func:`pandas.merge` adds a string of ``None``, if ``None`` is assigned in suffixes instead of remain the column name as-is (:issue:`24782`).
- Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (missing index values are now assigned NA) (:issue:`24212`, :issue:`25009`)
- :func:`to_records` now accepts dtypes to its ``column_dtypes`` parameter (:issue:`24895`)
- Bug in :func:`concat` where order of ``OrderedDict`` (and ``dict`` in Python 3.6+) is not respected, when passed in as  ``objs`` argument (:issue:`21510`)
- Bug in :func:`pivot_table` where columns with ``NaN`` values are dropped even if ``dropna`` argument is ``False``, when the ``aggfunc`` argument contains a ``list`` (:issue:`22159`)
- Bug in :func:`concat` where the resulting ``freq`` of two :class:`DatetimeIndex` with the same ``freq`` would be dropped (:issue:`3232`).
- Bug in :func:`merge` where merging with equivalent Categorical dtypes was raising an error (:issue:`22501`)
- bug in :class:`DataFrame` instantiating with a dict of iterators or generators (e.g. ``pd.DataFrame({'A': reversed(range(3))})``) raised an error (:issue:`26349`).
- Bug in :class:`DataFrame` instantiating with a ``range`` (e.g. ``pd.DataFrame(range(3))``) raised an error (:issue:`26342`).
- Bug in :class:`DataFrame` constructor when passing non-empty tuples would cause a segmentation fault (:issue:`25691`)
- Bug in :func:`Series.apply` failed when the series is a timezone aware :class:`DatetimeIndex` (:issue:`25959`)
- Bug in :func:`pandas.cut` where large bins could incorrectly raise an error due to an integer overflow (:issue:`26045`)
- Bug in :func:`DataFrame.sort_index` where an error is thrown when a multi-indexed ``DataFrame`` is sorted on all levels with the initial level sorted last (:issue:`26053`)
- Bug in :meth:`Series.nlargest` treats ``True`` as smaller than ``False`` (:issue:`26154`)
- Bug in :func:`DataFrame.pivot_table` with a :class:`IntervalIndex` as pivot index would raise ``TypeError`` (:issue:`25814`)
- Bug in which :meth:`DataFrame.from_dict` ignored order of ``OrderedDict`` when ``orient='index'`` (:issue:`8425`).
- Bug in :meth:`DataFrame.transpose` where transposing a DataFrame with a timezone-aware datetime column would incorrectly raise ``ValueError`` (:issue:`26825`)
- Bug in :func:`pivot_table` when pivoting a timezone aware column as the ``values`` would remove timezone information (:issue:`14948`)
- Bug in :func:`merge_asof` when specifying multiple ``by`` columns where one is ``datetime64[ns, tz]`` dtype (:issue:`26649`)

Sparse
^^^^^^

- Significant speedup in :class:`SparseArray` initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`)
- Bug in :class:`SparseFrame` constructor where passing ``None`` as the data would cause ``default_fill_value`` to be ignored (:issue:`16807`)
- Bug in :class:`SparseDataFrame` when adding a column in which the length of values does not match length of index, ``AssertionError`` is raised instead of raising ``ValueError`` (:issue:`25484`)
- Introduce a better error message in :meth:`Series.sparse.from_coo` so it returns a ``TypeError`` for inputs that are not coo matrices (:issue:`26554`)
- Bug in :func:`numpy.modf` on a :class:`SparseArray`. Now a tuple of :class:`SparseArray` is returned (:issue:`26946`).


Build changes
^^^^^^^^^^^^^

- Fix install error with PyPy on macOS (:issue:`26536`)

ExtensionArray
^^^^^^^^^^^^^^

- Bug in :func:`factorize` when passing an ``ExtensionArray`` with a custom ``na_sentinel`` (:issue:`25696`).
- :meth:`Series.count` miscounts NA values in ExtensionArrays (:issue:`26835`)
- Added ``Series.__array_ufunc__`` to better handle NumPy ufuncs applied to Series backed by extension arrays (:issue:`23293`).
- Keyword argument ``deep`` has been removed from :meth:`ExtensionArray.copy` (:issue:`27083`)

Other
^^^^^

- Removed unused C functions from vendored UltraJSON implementation (:issue:`26198`)
- Allow :class:`Index` and :class:`RangeIndex` to be passed to numpy ``min`` and ``max`` functions (:issue:`26125`)
- Use actual class name in repr of empty objects of a ``Series`` subclass (:issue:`27001`).
- Bug in :class:`DataFrame` where passing an object array of timezone-aware ``datetime`` objects would incorrectly raise ``ValueError`` (:issue:`13287`)

.. _whatsnew_0.250.contributors:

Contributors
~~~~~~~~~~~~

.. contributors:: v0.24.2..v0.25.0