File: v2.2.0.rst

package info (click to toggle)
pandas 2.2.3%2Bdfsg-9
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 66,784 kB
  • sloc: python: 422,228; ansic: 9,190; sh: 270; xml: 102; makefile: 83
file content (949 lines) | stat: -rw-r--r-- 59,257 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
.. _whatsnew_220:

What's new in 2.2.0 (January 19, 2024)
--------------------------------------

These are the changes in pandas 2.2.0. See :ref:`release` for a full changelog
including other versions of pandas.

{{ header }}

.. ---------------------------------------------------------------------------

.. _whatsnew_220.upcoming_changes:

Upcoming changes in pandas 3.0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

pandas 3.0 will bring two bigger changes to the default behavior of pandas.

Copy-on-Write
^^^^^^^^^^^^^

The currently optional mode Copy-on-Write will be enabled by default in pandas 3.0. There
won't be an option to keep the current behavior enabled. The new behavioral semantics are
explained in the :ref:`user guide about Copy-on-Write <copy_on_write>`.

The new behavior can be enabled since pandas 2.0 with the following option:

.. code-block:: ipython

   pd.options.mode.copy_on_write = True

This change brings different changes in behavior in how pandas operates with respect to
copies and views. Some of these changes allow a clear deprecation, like the changes in
chained assignment. Other changes are more subtle and thus, the warnings are hidden behind
an option that can be enabled in pandas 2.2.

.. code-block:: ipython

   pd.options.mode.copy_on_write = "warn"

This mode will warn in many different scenarios that aren't actually relevant to
most queries. We recommend exploring this mode, but it is not necessary to get rid
of all of these warnings. The :ref:`migration guide <copy_on_write.migration_guide>`
explains the upgrade process in more detail.

Dedicated string data type (backed by Arrow) by default
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Historically, pandas represented string columns with NumPy object data type. This
representation has numerous problems, including slow performance and a large memory
footprint. This will change in pandas 3.0. pandas will start inferring string columns
as a new ``string`` data type, backed by Arrow, which represents strings contiguous in memory. This brings
a huge performance and memory improvement.

Old behavior:

.. code-block:: ipython

    In [1]: ser = pd.Series(["a", "b"])
    Out[1]:
    0    a
    1    b
    dtype: object

New behavior:


.. code-block:: ipython

    In [1]: ser = pd.Series(["a", "b"])
    Out[1]:
    0    a
    1    b
    dtype: string

The string data type that is used in these scenarios will mostly behave as NumPy
object would, including missing value semantics and general operations on these
columns.

This change includes a few additional changes across the API:

- Currently, specifying ``dtype="string"`` creates a dtype that is backed by Python strings
  which are stored in a NumPy array. This will change in pandas 3.0, this dtype
  will create an Arrow backed string column.
- The column names and the Index will also be backed by Arrow strings.
- PyArrow will become a required dependency with pandas 3.0 to accommodate this change.

This future dtype inference logic can be enabled with:

.. code-block:: ipython

   pd.options.future.infer_string = True

.. _whatsnew_220.enhancements:

Enhancements
~~~~~~~~~~~~

.. _whatsnew_220.enhancements.adbc_support:

ADBC Driver support in to_sql and read_sql
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`read_sql` and :meth:`~DataFrame.to_sql` now work with `Apache Arrow ADBC
<https://arrow.apache.org/adbc/current/index.html>`_ drivers. Compared to
traditional drivers used via SQLAlchemy, ADBC drivers should provide
significant performance improvements, better type support and cleaner
nullability handling.

.. code-block:: ipython

   import adbc_driver_postgresql.dbapi as pg_dbapi

   df = pd.DataFrame(
       [
           [1, 2, 3],
           [4, 5, 6],
       ],
       columns=['a', 'b', 'c']
   )
   uri = "postgresql://postgres:postgres@localhost/postgres"
   with pg_dbapi.connect(uri) as conn:
       df.to_sql("pandas_table", conn, index=False)

   # for round-tripping
   with pg_dbapi.connect(uri) as conn:
       df2 = pd.read_sql("pandas_table", conn)

The Arrow type system offers a wider array of types that can more closely match
what databases like PostgreSQL can offer. To illustrate, note this (non-exhaustive)
listing of types available in different databases and pandas backends:

+-----------------+-----------------------+----------------+---------+
|numpy/pandas     |arrow                  |postgres        |sqlite   |
+=================+=======================+================+=========+
|int16/Int16      |int16                  |SMALLINT        |INTEGER  |
+-----------------+-----------------------+----------------+---------+
|int32/Int32      |int32                  |INTEGER         |INTEGER  |
+-----------------+-----------------------+----------------+---------+
|int64/Int64      |int64                  |BIGINT          |INTEGER  |
+-----------------+-----------------------+----------------+---------+
|float32          |float32                |REAL            |REAL     |
+-----------------+-----------------------+----------------+---------+
|float64          |float64                |DOUBLE PRECISION|REAL     |
+-----------------+-----------------------+----------------+---------+
|object           |string                 |TEXT            |TEXT     |
+-----------------+-----------------------+----------------+---------+
|bool             |``bool_``              |BOOLEAN         |         |
+-----------------+-----------------------+----------------+---------+
|datetime64[ns]   |timestamp(us)          |TIMESTAMP       |         |
+-----------------+-----------------------+----------------+---------+
|datetime64[ns,tz]|timestamp(us,tz)       |TIMESTAMPTZ     |         |
+-----------------+-----------------------+----------------+---------+
|                 |date32                 |DATE            |         |
+-----------------+-----------------------+----------------+---------+
|                 |month_day_nano_interval|INTERVAL        |         |
+-----------------+-----------------------+----------------+---------+
|                 |binary                 |BINARY          |BLOB     |
+-----------------+-----------------------+----------------+---------+
|                 |decimal128             |DECIMAL [#f1]_  |         |
+-----------------+-----------------------+----------------+---------+
|                 |list                   |ARRAY [#f1]_    |         |
+-----------------+-----------------------+----------------+---------+
|                 |struct                 |COMPOSITE TYPE  |         |
|                 |                       | [#f1]_         |         |
+-----------------+-----------------------+----------------+---------+

.. rubric:: Footnotes

.. [#f1] Not implemented as of writing, but theoretically possible

If you are interested in preserving database types as best as possible
throughout the lifecycle of your DataFrame, users are encouraged to
leverage the ``dtype_backend="pyarrow"`` argument of :func:`~pandas.read_sql`

.. code-block:: ipython

   # for round-tripping
   with pg_dbapi.connect(uri) as conn:
       df2 = pd.read_sql("pandas_table", conn, dtype_backend="pyarrow")

This will prevent your data from being converted to the traditional pandas/NumPy
type system, which often converts SQL types in ways that make them impossible to
round-trip.

For a full list of ADBC drivers and their development status, see the `ADBC Driver
Implementation Status <https://arrow.apache.org/adbc/current/driver/status.html>`_
documentation.

.. _whatsnew_220.enhancements.case_when:

Create a pandas Series based on one or more conditions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :meth:`Series.case_when` function has been added to create a Series object based on one or more conditions. (:issue:`39154`)

.. ipython:: python

   import pandas as pd

   df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]))
   default=pd.Series('default', index=df.index)
   default.case_when(
        caselist=[
            (df.a == 1, 'first'),                              # condition, replacement
            (df.a.gt(1) & df.b.eq(5), 'second'),  # condition, replacement
        ],
   )

.. _whatsnew_220.enhancements.to_numpy_ea:

``to_numpy`` for NumPy nullable and Arrow types converts to suitable NumPy dtype
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``to_numpy`` for NumPy nullable and Arrow types will now convert to a
suitable NumPy dtype instead of ``object`` dtype for nullable and PyArrow backed extension dtypes.

*Old behavior:*

.. code-block:: ipython

    In [1]: ser = pd.Series([1, 2, 3], dtype="Int64")
    In [2]: ser.to_numpy()
    Out[2]: array([1, 2, 3], dtype=object)

*New behavior:*

.. ipython:: python

    ser = pd.Series([1, 2, 3], dtype="Int64")
    ser.to_numpy()

    ser = pd.Series([1, 2, 3], dtype="timestamp[ns][pyarrow]")
    ser.to_numpy()

The default NumPy dtype (without any arguments) is determined as follows:

- float dtypes are cast to NumPy floats
- integer dtypes without missing values are cast to NumPy integer dtypes
- integer dtypes with missing values are cast to NumPy float dtypes and ``NaN`` is used as missing value indicator
- boolean dtypes without missing values are cast to NumPy bool dtype
- boolean dtypes with missing values keep object dtype
- datetime and timedelta types are cast to Numpy datetime64 and timedelta64 types respectively and ``NaT`` is used as missing value indicator

.. _whatsnew_220.enhancements.struct_accessor:

Series.struct accessor for PyArrow structured data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``Series.struct`` accessor provides attributes and methods for processing
data with ``struct[pyarrow]`` dtype Series. For example,
:meth:`Series.struct.explode` converts PyArrow structured data to a pandas
DataFrame. (:issue:`54938`)

.. ipython:: python

    import pyarrow as pa
    series = pd.Series(
        [
            {"project": "pandas", "version": "2.2.0"},
            {"project": "numpy", "version": "1.25.2"},
            {"project": "pyarrow", "version": "13.0.0"},
        ],
        dtype=pd.ArrowDtype(
            pa.struct([
                ("project", pa.string()),
                ("version", pa.string()),
            ])
        ),
    )
    series.struct.explode()

Use :meth:`Series.struct.field` to index into a (possible nested)
struct field.


.. ipython:: python

    series.struct.field("project")

.. _whatsnew_220.enhancements.list_accessor:

Series.list accessor for PyArrow list data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``Series.list`` accessor provides attributes and methods for processing
data with ``list[pyarrow]`` dtype Series. For example,
:meth:`Series.list.__getitem__` allows indexing pyarrow lists in
a Series. (:issue:`55323`)

.. ipython:: python

    import pyarrow as pa
    series = pd.Series(
        [
            [1, 2, 3],
            [4, 5],
            [6],
        ],
        dtype=pd.ArrowDtype(
            pa.list_(pa.int64())
        ),
    )
    series.list[0]

.. _whatsnew_220.enhancements.calamine:

Calamine engine for :func:`read_excel`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``calamine`` engine was added to :func:`read_excel`.
It uses ``python-calamine``, which provides Python bindings for the Rust library `calamine <https://crates.io/crates/calamine>`__.
This engine supports Excel files (``.xlsx``, ``.xlsm``, ``.xls``, ``.xlsb``) and OpenDocument spreadsheets (``.ods``) (:issue:`50395`).

There are two advantages of this engine:

1. Calamine is often faster than other engines, some benchmarks show results up to 5x faster than 'openpyxl', 20x - 'odf', 4x - 'pyxlsb', and 1.5x - 'xlrd'.
   But, 'openpyxl' and 'pyxlsb' are faster in reading a few rows from large files because of lazy iteration over rows.
2. Calamine supports the recognition of datetime in ``.xlsb`` files, unlike 'pyxlsb' which is the only other engine in pandas that can read ``.xlsb`` files.

.. code-block:: python

   pd.read_excel("path_to_file.xlsb", engine="calamine")


For more, see :ref:`io.calamine` in the user guide on IO tools.

.. _whatsnew_220.enhancements.other:

Other enhancements
^^^^^^^^^^^^^^^^^^

- :meth:`~DataFrame.to_sql` with method parameter set to ``multi`` works with Oracle on the backend
- :attr:`Series.attrs` / :attr:`DataFrame.attrs` now uses a deepcopy for propagating ``attrs`` (:issue:`54134`).
- :func:`get_dummies` now returning  extension dtypes ``boolean`` or ``bool[pyarrow]`` that are compatible with the input dtype (:issue:`56273`)
- :func:`read_csv` now supports ``on_bad_lines`` parameter with ``engine="pyarrow"`` (:issue:`54480`)
- :func:`read_sas` returns ``datetime64`` dtypes with resolutions better matching those stored natively in SAS, and avoids returning object-dtype in cases that cannot be stored with ``datetime64[ns]`` dtype (:issue:`56127`)
- :func:`read_spss` now returns a :class:`DataFrame` that stores the metadata in :attr:`DataFrame.attrs` (:issue:`54264`)
- :func:`tseries.api.guess_datetime_format` is now part of the public API (:issue:`54727`)
- :meth:`DataFrame.apply` now allows the usage of numba (via ``engine="numba"``) to JIT compile the passed function, allowing for potential speedups (:issue:`54666`)
- :meth:`ExtensionArray._explode` interface method added to allow extension type implementations of the ``explode`` method (:issue:`54833`)
- :meth:`ExtensionArray.duplicated` added to allow extension type implementations of the ``duplicated`` method (:issue:`55255`)
- :meth:`Series.ffill`, :meth:`Series.bfill`, :meth:`DataFrame.ffill`, and :meth:`DataFrame.bfill` have gained the argument ``limit_area``; 3rd party :class:`.ExtensionArray` authors need to add this argument to the method ``_pad_or_backfill`` (:issue:`56492`)
- Allow passing ``read_only``, ``data_only`` and ``keep_links`` arguments to openpyxl using ``engine_kwargs`` of :func:`read_excel` (:issue:`55027`)
- Implement :meth:`Series.interpolate` and :meth:`DataFrame.interpolate` for :class:`ArrowDtype` and masked dtypes (:issue:`56267`)
- Implement masked algorithms for :meth:`Series.value_counts` (:issue:`54984`)
- Implemented :meth:`Series.dt` methods and attributes for :class:`ArrowDtype` with ``pyarrow.duration`` type (:issue:`52284`)
- Implemented :meth:`Series.str.extract` for :class:`ArrowDtype` (:issue:`56268`)
- Improved error message that appears in :meth:`DatetimeIndex.to_period` with frequencies which are not supported as period frequencies, such as ``"BMS"`` (:issue:`56243`)
- Improved error message when constructing :class:`Period` with invalid offsets such as ``"QS"`` (:issue:`55785`)
- The dtypes ``string[pyarrow]`` and ``string[pyarrow_numpy]`` now both utilize the ``large_string`` type from PyArrow to avoid overflow for long columns (:issue:`56259`)

.. ---------------------------------------------------------------------------
.. _whatsnew_220.notable_bug_fixes:

Notable bug fixes
~~~~~~~~~~~~~~~~~

These are bug fixes that might have notable behavior changes.

.. _whatsnew_220.notable_bug_fixes.merge_sort_behavior:

:func:`merge` and :meth:`DataFrame.join` now consistently follow documented sort behavior
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions of pandas, :func:`merge` and :meth:`DataFrame.join` did not
always return a result that followed the documented sort behavior. pandas now
follows the documented sort behavior in merge and join operations (:issue:`54611`, :issue:`56426`, :issue:`56443`).

As documented, ``sort=True`` sorts the join keys lexicographically in the resulting
:class:`DataFrame`. With ``sort=False``, the order of the join keys depends on the
join type (``how`` keyword):

- ``how="left"``: preserve the order of the left keys
- ``how="right"``: preserve the order of the right keys
- ``how="inner"``: preserve the order of the left keys
- ``how="outer"``: sort keys lexicographically

One example with changing behavior is inner joins with non-unique left join keys
and ``sort=False``:

.. ipython:: python

    left = pd.DataFrame({"a": [1, 2, 1]})
    right = pd.DataFrame({"a": [1, 2]})
    result = pd.merge(left, right, how="inner", on="a", sort=False)

*Old Behavior*

.. code-block:: ipython

    In [5]: result
    Out[5]:
       a
    0  1
    1  1
    2  2

*New Behavior*

.. ipython:: python

    result

.. _whatsnew_220.notable_bug_fixes.multiindex_join_different_levels:

:func:`merge` and :meth:`DataFrame.join` no longer reorder levels when levels differ
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions of pandas, :func:`merge` and :meth:`DataFrame.join` would reorder
index levels when joining on two indexes with different levels (:issue:`34133`).

.. ipython:: python

    left = pd.DataFrame({"left": 1}, index=pd.MultiIndex.from_tuples([("x", 1), ("x", 2)], names=["A", "B"]))
    right = pd.DataFrame({"right": 2}, index=pd.MultiIndex.from_tuples([(1, 1), (2, 2)], names=["B", "C"]))
    left
    right
    result = left.join(right)

*Old Behavior*

.. code-block:: ipython

    In [5]: result
    Out[5]:
           left  right
    B A C
    1 x 1     1      2
    2 x 2     1      2

*New Behavior*

.. ipython:: python

    result

.. _whatsnew_220.api_breaking.deps:

Increased minimum versions for dependencies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For `optional dependencies <https://pandas.pydata.org/docs/getting_started/install.html>`_ the general recommendation is to use the latest version.
Optional dependencies below the lowest tested version may still work but are not considered supported.
The following table lists the optional dependencies that have had their minimum tested version increased.

+-----------------+---------------------+
| Package         | New Minimum Version |
+=================+=====================+
| beautifulsoup4  | 4.11.2              |
+-----------------+---------------------+
| blosc           | 1.21.3              |
+-----------------+---------------------+
| bottleneck      | 1.3.6               |
+-----------------+---------------------+
| fastparquet     | 2022.12.0           |
+-----------------+---------------------+
| fsspec          | 2022.11.0           |
+-----------------+---------------------+
| gcsfs           | 2022.11.0           |
+-----------------+---------------------+
| lxml            | 4.9.2               |
+-----------------+---------------------+
| matplotlib      | 3.6.3               |
+-----------------+---------------------+
| numba           | 0.56.4              |
+-----------------+---------------------+
| numexpr         | 2.8.4               |
+-----------------+---------------------+
| qtpy            | 2.3.0               |
+-----------------+---------------------+
| openpyxl        | 3.1.0               |
+-----------------+---------------------+
| psycopg2        | 2.9.6               |
+-----------------+---------------------+
| pyreadstat      | 1.2.0               |
+-----------------+---------------------+
| pytables        | 3.8.0               |
+-----------------+---------------------+
| pyxlsb          | 1.0.10              |
+-----------------+---------------------+
| s3fs            | 2022.11.0           |
+-----------------+---------------------+
| scipy           | 1.10.0              |
+-----------------+---------------------+
| sqlalchemy      | 2.0.0               |
+-----------------+---------------------+
| tabulate        | 0.9.0               |
+-----------------+---------------------+
| xarray          | 2022.12.0           |
+-----------------+---------------------+
| xlsxwriter      | 3.0.5               |
+-----------------+---------------------+
| zstandard       | 0.19.0              |
+-----------------+---------------------+
| pyqt5           | 5.15.8              |
+-----------------+---------------------+
| tzdata          | 2022.7              |
+-----------------+---------------------+

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

.. _whatsnew_220.api_breaking.other:

Other API changes
^^^^^^^^^^^^^^^^^
- The hash values of nullable extension dtypes changed to improve the performance of the hashing operation (:issue:`56507`)
- ``check_exact`` now only takes effect for floating-point dtypes in :func:`testing.assert_frame_equal` and :func:`testing.assert_series_equal`. In particular, integer dtypes are always checked exactly (:issue:`55882`)

.. ---------------------------------------------------------------------------
.. _whatsnew_220.deprecations:

Deprecations
~~~~~~~~~~~~

Chained assignment
^^^^^^^^^^^^^^^^^^

In preparation of larger upcoming changes to the copy / view behaviour in pandas 3.0
(:ref:`copy_on_write`, PDEP-7), we started deprecating *chained assignment*.

Chained assignment occurs when you try to update a pandas DataFrame or Series through
two subsequent indexing operations. Depending on the type and order of those operations
this currently does or does not work.

A typical example is as follows:

.. code-block:: python

    df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})

    # first selecting rows with a mask, then assigning values to a column
    # -> this has never worked and raises a SettingWithCopyWarning
    df[df["bar"] > 5]["foo"] = 100

    # first selecting the column, and then assigning to a subset of that column
    # -> this currently works
    df["foo"][df["bar"] > 5] = 100

This second example of chained assignment currently works to update the original ``df``.
This will no longer work in pandas 3.0, and therefore we started deprecating this:

.. code-block:: python

    >>> df["foo"][df["bar"] > 5] = 100
    FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
    You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
    A typical example is when you are setting values in a column of a DataFrame, like:

    df["col"][row_indexer] = value

    Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

    See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

You can fix this warning and ensure your code is ready for pandas 3.0 by removing
the usage of chained assignment. Typically, this can be done by doing the assignment
in a single step using for example ``.loc``. For the example above, we can do:

.. code-block:: python

    df.loc[df["bar"] > 5, "foo"] = 100

The same deprecation applies to inplace methods that are done in a chained manner, such as:

.. code-block:: python

    >>> df["foo"].fillna(0, inplace=True)
    FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
    The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

    For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

When the goal is to update the column in the DataFrame ``df``, the alternative here is
to call the method on ``df`` itself, such as ``df.fillna({"foo": 0}, inplace=True)``.

See more details in the :ref:`migration guide <copy_on_write.migration_guide>`.


Deprecate aliases ``M``, ``Q``, ``Y``, etc. in favour of ``ME``, ``QE``, ``YE``, etc. for offsets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Deprecated the following frequency aliases (:issue:`9586`):

+-------------------------------+------------------+------------------+
|offsets                        |deprecated aliases|new aliases       |
+===============================+==================+==================+
|:class:`MonthEnd`              |      ``M``       |     ``ME``       |
+-------------------------------+------------------+------------------+
|:class:`BusinessMonthEnd`      |      ``BM``      |     ``BME``      |
+-------------------------------+------------------+------------------+
|:class:`SemiMonthEnd`          |      ``SM``      |     ``SME``      |
+-------------------------------+------------------+------------------+
|:class:`CustomBusinessMonthEnd`|      ``CBM``     |     ``CBME``     |
+-------------------------------+------------------+------------------+
|:class:`QuarterEnd`            |      ``Q``       |     ``QE``       |
+-------------------------------+------------------+------------------+
|:class:`BQuarterEnd`           |      ``BQ``      |     ``BQE``      |
+-------------------------------+------------------+------------------+
|:class:`YearEnd`               |      ``Y``       |     ``YE``       |
+-------------------------------+------------------+------------------+
|:class:`BYearEnd`              |      ``BY``      |     ``BYE``      |
+-------------------------------+------------------+------------------+

For example:

*Previous behavior*:

.. code-block:: ipython

    In [8]: pd.date_range('2020-01-01', periods=3, freq='Q-NOV')
    Out[8]:
    DatetimeIndex(['2020-02-29', '2020-05-31', '2020-08-31'],
                  dtype='datetime64[ns]', freq='Q-NOV')

*Future behavior*:

.. ipython:: python

    pd.date_range('2020-01-01', periods=3, freq='QE-NOV')

Deprecated automatic downcasting
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Deprecated the automatic downcasting of object dtype results in a number of
methods. These would silently change the dtype in a hard to predict manner since the
behavior was value dependent. Additionally, pandas is moving away from silent dtype
changes (:issue:`54710`, :issue:`54261`).

These methods are:

- :meth:`Series.replace` and :meth:`DataFrame.replace`
- :meth:`DataFrame.fillna`, :meth:`Series.fillna`
- :meth:`DataFrame.ffill`, :meth:`Series.ffill`
- :meth:`DataFrame.bfill`, :meth:`Series.bfill`
- :meth:`DataFrame.mask`, :meth:`Series.mask`
- :meth:`DataFrame.where`, :meth:`Series.where`
- :meth:`DataFrame.clip`, :meth:`Series.clip`

Explicitly call :meth:`DataFrame.infer_objects` to replicate the current behavior in the future.

.. code-block:: ipython

    result = result.infer_objects(copy=False)

Or explicitly cast all-round floats to ints using ``astype``.

Set the following option to opt into the future behavior:

.. code-block:: ipython

    In [9]: pd.set_option("future.no_silent_downcasting", True)

Other Deprecations
^^^^^^^^^^^^^^^^^^
- Changed :meth:`Timedelta.resolution_string` to return ``h``, ``min``, ``s``, ``ms``, ``us``, and ``ns`` instead of ``H``, ``T``, ``S``, ``L``, ``U``, and ``N``, for compatibility with respective deprecations in frequency aliases (:issue:`52536`)
- Deprecated :attr:`offsets.Day.delta`, :attr:`offsets.Hour.delta`, :attr:`offsets.Minute.delta`, :attr:`offsets.Second.delta`, :attr:`offsets.Milli.delta`, :attr:`offsets.Micro.delta`, :attr:`offsets.Nano.delta`, use ``pd.Timedelta(obj)`` instead (:issue:`55498`)
- Deprecated :func:`pandas.api.types.is_interval` and :func:`pandas.api.types.is_period`, use ``isinstance(obj, pd.Interval)`` and ``isinstance(obj, pd.Period)`` instead (:issue:`55264`)
- Deprecated :func:`read_gbq` and :meth:`DataFrame.to_gbq`. Use ``pandas_gbq.read_gbq`` and ``pandas_gbq.to_gbq`` instead https://pandas-gbq.readthedocs.io/en/latest/api.html (:issue:`55525`)
- Deprecated :meth:`.DataFrameGroupBy.fillna` and :meth:`.SeriesGroupBy.fillna`; use :meth:`.DataFrameGroupBy.ffill`, :meth:`.DataFrameGroupBy.bfill` for forward and backward filling or :meth:`.DataFrame.fillna` to fill with a single value (or the Series equivalents) (:issue:`55718`)
- Deprecated :meth:`DateOffset.is_anchored`, use ``obj.n == 1`` for non-Tick subclasses (for Tick this was always False) (:issue:`55388`)
- Deprecated :meth:`DatetimeArray.__init__` and :meth:`TimedeltaArray.__init__`, use :func:`array` instead (:issue:`55623`)
- Deprecated :meth:`Index.format`, use ``index.astype(str)`` or ``index.map(formatter)`` instead (:issue:`55413`)
- Deprecated :meth:`Series.ravel`, the underlying array is already 1D, so ravel is not necessary (:issue:`52511`)
- Deprecated :meth:`Series.resample` and :meth:`DataFrame.resample` with a :class:`PeriodIndex` (and the 'convention' keyword), convert to :class:`DatetimeIndex` (with ``.to_timestamp()``) before resampling instead (:issue:`53481`)
- Deprecated :meth:`Series.view`, use :meth:`Series.astype` instead to change the dtype (:issue:`20251`)
- Deprecated :meth:`offsets.Tick.is_anchored`, use ``False`` instead (:issue:`55388`)
- Deprecated ``core.internals`` members ``Block``, ``ExtensionBlock``, and ``DatetimeTZBlock``, use public APIs instead (:issue:`55139`)
- Deprecated ``year``, ``month``, ``quarter``, ``day``, ``hour``, ``minute``, and ``second`` keywords in the :class:`PeriodIndex` constructor, use :meth:`PeriodIndex.from_fields` instead (:issue:`55960`)
- Deprecated accepting a type as an argument in :meth:`Index.view`, call without any arguments instead (:issue:`55709`)
- Deprecated allowing non-integer ``periods`` argument in :func:`date_range`, :func:`timedelta_range`, :func:`period_range`, and :func:`interval_range` (:issue:`56036`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_clipboard` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_csv` except ``path_or_buf`` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_dict` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_excel` except ``excel_writer`` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_gbq` except ``destination_table`` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_hdf` except ``path_or_buf`` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_html` except ``buf`` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_json` except ``path_or_buf`` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_latex` except ``buf`` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_markdown` except ``buf`` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_parquet` except ``path`` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_pickle` except ``path`` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_string` except ``buf`` (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_xml` except ``path_or_buffer`` (:issue:`54229`)
- Deprecated allowing passing :class:`BlockManager` objects to :class:`DataFrame` or :class:`SingleBlockManager` objects to :class:`Series` (:issue:`52419`)
- Deprecated behavior of :meth:`Index.insert` with an object-dtype index silently performing type inference on the result, explicitly call ``result.infer_objects(copy=False)`` for the old behavior instead (:issue:`51363`)
- Deprecated casting non-datetimelike values (mainly strings) in :meth:`Series.isin` and :meth:`Index.isin` with ``datetime64``, ``timedelta64``, and :class:`PeriodDtype` dtypes (:issue:`53111`)
- Deprecated dtype inference in :class:`Index`, :class:`Series` and :class:`DataFrame` constructors when giving a pandas input, call ``.infer_objects`` on the input to keep the current behavior (:issue:`56012`)
- Deprecated dtype inference when setting a :class:`Index` into a :class:`DataFrame`, cast explicitly instead (:issue:`56102`)
- Deprecated including the groups in computations when using :meth:`.DataFrameGroupBy.apply` and :meth:`.DataFrameGroupBy.resample`; pass ``include_groups=False`` to exclude the groups (:issue:`7155`)
- Deprecated indexing an :class:`Index`  with a boolean indexer of length zero (:issue:`55820`)
- Deprecated not passing a tuple to :class:`.DataFrameGroupBy.get_group` or :class:`.SeriesGroupBy.get_group` when grouping by a length-1 list-like (:issue:`25971`)
- Deprecated string ``AS`` denoting frequency in :class:`YearBegin` and strings ``AS-DEC``, ``AS-JAN``, etc. denoting annual frequencies with various fiscal year starts (:issue:`54275`)
- Deprecated string ``A`` denoting frequency in :class:`YearEnd` and strings ``A-DEC``, ``A-JAN``, etc. denoting annual frequencies with various fiscal year ends (:issue:`54275`)
- Deprecated string ``BAS`` denoting frequency in :class:`BYearBegin` and strings ``BAS-DEC``, ``BAS-JAN``, etc. denoting annual frequencies with various fiscal year starts (:issue:`54275`)
- Deprecated string ``BA`` denoting frequency in :class:`BYearEnd` and strings ``BA-DEC``, ``BA-JAN``, etc. denoting annual frequencies with various fiscal year ends (:issue:`54275`)
- Deprecated strings ``H``, ``BH``, and ``CBH`` denoting frequencies in :class:`Hour`, :class:`BusinessHour`, :class:`CustomBusinessHour` (:issue:`52536`)
- Deprecated strings ``H``, ``S``, ``U``, and ``N`` denoting units in :func:`to_timedelta` (:issue:`52536`)
- Deprecated strings ``H``, ``T``, ``S``, ``L``, ``U``, and ``N`` denoting units in :class:`Timedelta` (:issue:`52536`)
- Deprecated strings ``T``, ``S``, ``L``, ``U``, and ``N`` denoting frequencies in :class:`Minute`, :class:`Second`, :class:`Milli`, :class:`Micro`, :class:`Nano` (:issue:`52536`)
- Deprecated support for combining parsed datetime columns in :func:`read_csv` along with the ``keep_date_col`` keyword (:issue:`55569`)
- Deprecated the :attr:`.DataFrameGroupBy.grouper` and :attr:`SeriesGroupBy.grouper`; these attributes will be removed in a future version of pandas (:issue:`56521`)
- Deprecated the :class:`.Grouping` attributes ``group_index``, ``result_index``, and ``group_arraylike``; these will be removed in a future version of pandas (:issue:`56148`)
- Deprecated the ``delim_whitespace`` keyword in :func:`read_csv` and :func:`read_table`, use ``sep="\\s+"`` instead (:issue:`55569`)
- Deprecated the ``errors="ignore"`` option in :func:`to_datetime`, :func:`to_timedelta`, and :func:`to_numeric`; explicitly catch exceptions instead (:issue:`54467`)
- Deprecated the ``fastpath`` keyword in the :class:`Series` constructor (:issue:`20110`)
- Deprecated the ``kind`` keyword in :meth:`Series.resample` and :meth:`DataFrame.resample`, explicitly cast the object's ``index`` instead (:issue:`55895`)
- Deprecated the ``ordinal`` keyword in :class:`PeriodIndex`, use :meth:`PeriodIndex.from_ordinals` instead (:issue:`55960`)
- Deprecated the ``unit`` keyword in :class:`TimedeltaIndex` construction, use :func:`to_timedelta` instead (:issue:`55499`)
- Deprecated the ``verbose`` keyword in :func:`read_csv` and :func:`read_table` (:issue:`55569`)
- Deprecated the behavior of :meth:`DataFrame.replace` and :meth:`Series.replace` with :class:`CategoricalDtype`; in a future version replace will change the values while preserving the categories. To change the categories, use ``ser.cat.rename_categories`` instead (:issue:`55147`)
- Deprecated the behavior of :meth:`Series.value_counts` and :meth:`Index.value_counts` with object dtype; in a future version these will not perform dtype inference on the resulting :class:`Index`, do ``result.index = result.index.infer_objects()`` to retain the old behavior (:issue:`56161`)
- Deprecated the default of ``observed=False`` in :meth:`DataFrame.pivot_table`; will be ``True`` in a future version (:issue:`56236`)
- Deprecated the extension test classes ``BaseNoReduceTests``, ``BaseBooleanReduceTests``, and ``BaseNumericReduceTests``, use ``BaseReduceTests`` instead (:issue:`54663`)
- Deprecated the option ``mode.data_manager`` and the ``ArrayManager``; only the ``BlockManager`` will be available in future versions (:issue:`55043`)
- Deprecated the previous implementation of :class:`DataFrame.stack`; specify ``future_stack=True`` to adopt the future version (:issue:`53515`)
-

.. ---------------------------------------------------------------------------
.. _whatsnew_220.performance:

Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement in :func:`.testing.assert_frame_equal` and :func:`.testing.assert_series_equal` (:issue:`55949`, :issue:`55971`)
- Performance improvement in :func:`concat` with ``axis=1`` and objects with unaligned indexes (:issue:`55084`)
- Performance improvement in :func:`get_dummies` (:issue:`56089`)
- Performance improvement in :func:`merge` and :func:`merge_ordered` when joining on sorted ascending keys (:issue:`56115`)
- Performance improvement in :func:`merge_asof` when ``by`` is not ``None`` (:issue:`55580`, :issue:`55678`)
- Performance improvement in :func:`read_stata` for files with many variables (:issue:`55515`)
- Performance improvement in :meth:`DataFrame.groupby` when aggregating pyarrow timestamp and duration dtypes (:issue:`55031`)
- Performance improvement in :meth:`DataFrame.join` when joining on unordered categorical indexes (:issue:`56345`)
- Performance improvement in :meth:`DataFrame.loc` and :meth:`Series.loc` when indexing with a :class:`MultiIndex` (:issue:`56062`)
- Performance improvement in :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` when indexed by a :class:`MultiIndex` (:issue:`54835`)
- Performance improvement in :meth:`DataFrame.to_dict` on converting DataFrame to dictionary (:issue:`50990`)
- Performance improvement in :meth:`Index.difference` (:issue:`55108`)
- Performance improvement in :meth:`Index.sort_values` when index is already sorted (:issue:`56128`)
- Performance improvement in :meth:`MultiIndex.get_indexer` when ``method`` is not ``None`` (:issue:`55839`)
- Performance improvement in :meth:`Series.duplicated` for pyarrow dtypes (:issue:`55255`)
- Performance improvement in :meth:`Series.str.get_dummies` when dtype is ``"string[pyarrow]"`` or ``"string[pyarrow_numpy]"`` (:issue:`56110`)
- Performance improvement in :meth:`Series.str` methods (:issue:`55736`)
- Performance improvement in :meth:`Series.value_counts` and :meth:`Series.mode` for masked dtypes (:issue:`54984`, :issue:`55340`)
- Performance improvement in :meth:`.DataFrameGroupBy.nunique` and :meth:`.SeriesGroupBy.nunique` (:issue:`55972`)
- Performance improvement in :meth:`.SeriesGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.DataFrameGroupBy.idxmin` (:issue:`54234`)
- Performance improvement when hashing a nullable extension array (:issue:`56507`)
- Performance improvement when indexing into a non-unique index (:issue:`55816`)
- Performance improvement when indexing with more than 4 keys (:issue:`54550`)
- Performance improvement when localizing time to UTC (:issue:`55241`)

.. ---------------------------------------------------------------------------
.. _whatsnew_220.bug_fixes:

Bug fixes
~~~~~~~~~

Categorical
^^^^^^^^^^^
- :meth:`Categorical.isin` raising ``InvalidIndexError`` for categorical containing overlapping :class:`Interval` values (:issue:`34974`)
- Bug in :meth:`CategoricalDtype.__eq__` returning ``False`` for unordered categorical data with mixed types (:issue:`55468`)
- Bug when casting ``pa.dictionary`` to :class:`CategoricalDtype` using a ``pa.DictionaryArray`` as categories (:issue:`56672`)

Datetimelike
^^^^^^^^^^^^
- Bug in :class:`DatetimeIndex` construction when passing both a ``tz`` and either ``dayfirst`` or ``yearfirst`` ignoring dayfirst/yearfirst (:issue:`55813`)
- Bug in :class:`DatetimeIndex` when passing an object-dtype ndarray of float objects and a ``tz`` incorrectly localizing the result (:issue:`55780`)
- Bug in :func:`Series.isin` with :class:`DatetimeTZDtype` dtype and comparison values that are all ``NaT`` incorrectly returning all-``False`` even if the series contains ``NaT`` entries (:issue:`56427`)
- Bug in :func:`concat` raising ``AttributeError`` when concatenating all-NA DataFrame with :class:`DatetimeTZDtype` dtype DataFrame (:issue:`52093`)
- Bug in :func:`testing.assert_extension_array_equal` that could use the wrong unit when comparing resolutions (:issue:`55730`)
- Bug in :func:`to_datetime` and :class:`DatetimeIndex` when passing a list of mixed-string-and-numeric types incorrectly raising (:issue:`55780`)
- Bug in :func:`to_datetime` and :class:`DatetimeIndex` when passing mixed-type objects with a mix of timezones or mix of timezone-awareness failing to raise ``ValueError`` (:issue:`55693`)
- Bug in :meth:`.Tick.delta` with very large ticks raising ``OverflowError`` instead of ``OutOfBoundsTimedelta`` (:issue:`55503`)
- Bug in :meth:`DatetimeIndex.shift` with non-nanosecond resolution incorrectly returning with nanosecond resolution (:issue:`56117`)
- Bug in :meth:`DatetimeIndex.union` returning object dtype for tz-aware indexes with the same timezone but different units (:issue:`55238`)
- Bug in :meth:`Index.is_monotonic_increasing` and :meth:`Index.is_monotonic_decreasing` always caching :meth:`Index.is_unique` as ``True`` when first value in index is ``NaT`` (:issue:`55755`)
- Bug in :meth:`Index.view` to a datetime64 dtype with non-supported resolution incorrectly raising (:issue:`55710`)
- Bug in :meth:`Series.dt.round` with non-nanosecond resolution and ``NaT`` entries incorrectly raising ``OverflowError`` (:issue:`56158`)
- Bug in :meth:`Series.fillna` with non-nanosecond resolution dtypes and higher-resolution vector values returning incorrect (internally-corrupted) results (:issue:`56410`)
- Bug in :meth:`Timestamp.unit` being inferred incorrectly from an ISO8601 format string with minute or hour resolution and a timezone offset (:issue:`56208`)
- Bug in ``.astype`` converting from a higher-resolution ``datetime64`` dtype to a lower-resolution ``datetime64`` dtype (e.g. ``datetime64[us]->datetime64[ms]``) silently overflowing with values near the lower implementation bound (:issue:`55979`)
- Bug in adding or subtracting a :class:`Week` offset to a ``datetime64`` :class:`Series`, :class:`Index`, or :class:`DataFrame` column with non-nanosecond resolution returning incorrect results (:issue:`55583`)
- Bug in addition or subtraction of :class:`BusinessDay` offset with ``offset`` attribute to non-nanosecond :class:`Index`, :class:`Series`, or :class:`DataFrame` column giving incorrect results (:issue:`55608`)
- Bug in addition or subtraction of :class:`DateOffset` objects with microsecond components to ``datetime64`` :class:`Index`, :class:`Series`, or :class:`DataFrame` columns with non-nanosecond resolution (:issue:`55595`)
- Bug in addition or subtraction of very large :class:`.Tick` objects with :class:`Timestamp` or :class:`Timedelta` objects raising ``OverflowError`` instead of ``OutOfBoundsTimedelta`` (:issue:`55503`)
- Bug in creating a :class:`Index`, :class:`Series`, or :class:`DataFrame` with a non-nanosecond :class:`DatetimeTZDtype` and inputs that would be out of bounds with nanosecond resolution incorrectly raising ``OutOfBoundsDatetime`` (:issue:`54620`)
- Bug in creating a :class:`Index`, :class:`Series`, or :class:`DataFrame` with a non-nanosecond ``datetime64`` (or :class:`DatetimeTZDtype`) from mixed-numeric inputs treating those as nanoseconds instead of as multiples of the dtype's unit (which would happen with non-mixed numeric inputs) (:issue:`56004`)
- Bug in creating a :class:`Index`, :class:`Series`, or :class:`DataFrame` with a non-nanosecond ``datetime64`` dtype and inputs that would be out of bounds for a ``datetime64[ns]`` incorrectly raising ``OutOfBoundsDatetime`` (:issue:`55756`)
- Bug in parsing datetime strings with nanosecond resolution with non-ISO8601 formats incorrectly truncating sub-microsecond components (:issue:`56051`)
- Bug in parsing datetime strings with sub-second resolution and trailing zeros incorrectly inferring second or millisecond resolution (:issue:`55737`)
- Bug in the results of :func:`to_datetime` with an floating-dtype argument with ``unit`` not matching the pointwise results of :class:`Timestamp` (:issue:`56037`)
- Fixed regression where :func:`concat` would raise an error when concatenating ``datetime64`` columns with differing resolutions (:issue:`53641`)

Timedelta
^^^^^^^^^
- Bug in :class:`Timedelta` construction raising ``OverflowError`` instead of ``OutOfBoundsTimedelta`` (:issue:`55503`)
- Bug in rendering (``__repr__``) of :class:`TimedeltaIndex` and :class:`Series` with timedelta64 values with non-nanosecond resolution entries that are all multiples of 24 hours failing to use the compact representation used in the nanosecond cases (:issue:`55405`)

Timezones
^^^^^^^^^
- Bug in :class:`AbstractHolidayCalendar` where timezone data was not propagated when computing holiday observances (:issue:`54580`)
- Bug in :class:`Timestamp` construction with an ambiguous value and a ``pytz`` timezone failing to raise ``pytz.AmbiguousTimeError`` (:issue:`55657`)
- Bug in :meth:`Timestamp.tz_localize` with ``nonexistent="shift_forward`` around UTC+0 during DST (:issue:`51501`)

Numeric
^^^^^^^
- Bug in :func:`read_csv` with ``engine="pyarrow"`` causing rounding errors for large integers (:issue:`52505`)
- Bug in :meth:`Series.__floordiv__` and :meth:`Series.__truediv__` for :class:`ArrowDtype` with integral dtypes raising for large divisors (:issue:`56706`)
- Bug in :meth:`Series.__floordiv__` for :class:`ArrowDtype` with integral dtypes raising for large values (:issue:`56645`)
- Bug in :meth:`Series.pow` not filling missing values correctly (:issue:`55512`)
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` matching float ``0.0`` with ``False`` and vice versa (:issue:`55398`)
- Bug in :meth:`Series.round` raising for nullable boolean dtype (:issue:`55936`)

Conversion
^^^^^^^^^^
- Bug in :meth:`DataFrame.astype` when called with ``str`` on unpickled array - the array might change in-place (:issue:`54654`)
- Bug in :meth:`DataFrame.astype` where ``errors="ignore"`` had no effect for extension types (:issue:`54654`)
- Bug in :meth:`Series.convert_dtypes` not converting all NA column to ``null[pyarrow]`` (:issue:`55346`)
- Bug in :meth:``DataFrame.loc`` was not throwing "incompatible dtype warning" (see `PDEP6 <https://pandas.pydata.org/pdeps/0006-ban-upcasting.html>`_) when assigning a ``Series`` with a different dtype using a full column setter (e.g. ``df.loc[:, 'a'] = incompatible_value``) (:issue:`39584`)

Strings
^^^^^^^
- Bug in :func:`pandas.api.types.is_string_dtype` while checking object array with no elements is of the string dtype (:issue:`54661`)
- Bug in :meth:`DataFrame.apply` failing when ``engine="numba"`` and columns or index have ``StringDtype`` (:issue:`56189`)
- Bug in :meth:`DataFrame.reindex` not matching :class:`Index` with ``string[pyarrow_numpy]`` dtype (:issue:`56106`)
- Bug in :meth:`Index.str.cat` always casting result to object dtype (:issue:`56157`)
- Bug in :meth:`Series.__mul__` for :class:`ArrowDtype` with ``pyarrow.string`` dtype and ``string[pyarrow]`` for the pyarrow backend (:issue:`51970`)
- Bug in :meth:`Series.str.find` when ``start < 0`` for :class:`ArrowDtype` with ``pyarrow.string`` (:issue:`56411`)
- Bug in :meth:`Series.str.fullmatch` when ``dtype=pandas.ArrowDtype(pyarrow.string()))`` allows partial matches when regex ends in literal //$ (:issue:`56652`)
- Bug in :meth:`Series.str.replace` when ``n < 0`` for :class:`ArrowDtype` with ``pyarrow.string`` (:issue:`56404`)
- Bug in :meth:`Series.str.startswith` and :meth:`Series.str.endswith` with arguments of type ``tuple[str, ...]`` for :class:`ArrowDtype` with ``pyarrow.string`` dtype (:issue:`56579`)
- Bug in :meth:`Series.str.startswith` and :meth:`Series.str.endswith` with arguments of type ``tuple[str, ...]`` for ``string[pyarrow]`` (:issue:`54942`)
- Bug in comparison operations for ``dtype="string[pyarrow_numpy]"`` raising if dtypes can't be compared (:issue:`56008`)

Interval
^^^^^^^^
- Bug in :class:`Interval` ``__repr__`` not displaying UTC offsets for :class:`Timestamp` bounds. Additionally the hour, minute and second components will now be shown (:issue:`55015`)
- Bug in :meth:`IntervalIndex.factorize` and :meth:`Series.factorize` with :class:`IntervalDtype` with datetime64 or timedelta64 intervals not preserving non-nanosecond units (:issue:`56099`)
- Bug in :meth:`IntervalIndex.from_arrays` when passed ``datetime64`` or ``timedelta64`` arrays with mismatched resolutions constructing an invalid ``IntervalArray`` object (:issue:`55714`)
- Bug in :meth:`IntervalIndex.from_tuples` raising if subtype is a nullable extension dtype (:issue:`56765`)
- Bug in :meth:`IntervalIndex.get_indexer` with datetime or timedelta intervals incorrectly matching on integer targets (:issue:`47772`)
- Bug in :meth:`IntervalIndex.get_indexer` with timezone-aware datetime intervals incorrectly matching on a sequence of timezone-naive targets (:issue:`47772`)
- Bug in setting values on a :class:`Series` with an :class:`IntervalIndex` using a slice incorrectly raising (:issue:`54722`)

Indexing
^^^^^^^^
- Bug in :meth:`DataFrame.loc` mutating a boolean indexer when :class:`DataFrame` has a :class:`MultiIndex` (:issue:`56635`)
- Bug in :meth:`DataFrame.loc` when setting :class:`Series` with extension dtype into NumPy dtype (:issue:`55604`)
- Bug in :meth:`Index.difference` not returning a unique set of values when ``other`` is empty or ``other`` is considered non-comparable (:issue:`55113`)
- Bug in setting :class:`Categorical` values into a :class:`DataFrame` with numpy dtypes raising ``RecursionError`` (:issue:`52927`)
- Fixed bug when creating new column with missing values when setting a single string value (:issue:`56204`)

Missing
^^^^^^^
- Bug in :meth:`DataFrame.update` wasn't updating in-place for tz-aware datetime64 dtypes (:issue:`56227`)

MultiIndex
^^^^^^^^^^
- Bug in :meth:`MultiIndex.get_indexer` not raising ``ValueError`` when ``method`` provided and index is non-monotonic (:issue:`53452`)

I/O
^^^
- Bug in :func:`read_csv` where ``engine="python"`` did not respect ``chunksize`` arg when ``skiprows`` was specified (:issue:`56323`)
- Bug in :func:`read_csv` where ``engine="python"`` was causing a ``TypeError`` when a callable ``skiprows`` and a chunk size was specified (:issue:`55677`)
- Bug in :func:`read_csv` where ``on_bad_lines="warn"`` would write to ``stderr`` instead of raising a Python warning; this now yields a :class:`.errors.ParserWarning` (:issue:`54296`)
- Bug in :func:`read_csv` with ``engine="pyarrow"`` where ``quotechar`` was ignored (:issue:`52266`)
- Bug in :func:`read_csv` with ``engine="pyarrow"`` where ``usecols`` wasn't working with a CSV with no headers (:issue:`54459`)
- Bug in :func:`read_excel`, with ``engine="xlrd"`` (``xls`` files) erroring when the file contains ``NaN`` or ``Inf`` (:issue:`54564`)
- Bug in :func:`read_json` not handling dtype conversion properly if ``infer_string`` is set (:issue:`56195`)
- Bug in :meth:`DataFrame.to_excel`, with ``OdsWriter`` (``ods`` files) writing Boolean/string value (:issue:`54994`)
- Bug in :meth:`DataFrame.to_hdf` and :func:`read_hdf` with ``datetime64`` dtypes with non-nanosecond resolution failing to round-trip correctly (:issue:`55622`)
- Bug in :meth:`DataFrame.to_stata` raising for extension dtypes (:issue:`54671`)
- Bug in :meth:`~pandas.read_excel` with ``engine="odf"`` (``ods`` files) when a string cell contains an annotation (:issue:`55200`)
- Bug in :meth:`~pandas.read_excel` with an ODS file without cached formatted cell for float values (:issue:`55219`)
- Bug where :meth:`DataFrame.to_json` would raise an ``OverflowError`` instead of a ``TypeError`` with unsupported NumPy types (:issue:`55403`)

Period
^^^^^^
- Bug in :class:`PeriodIndex` construction when more than one of ``data``, ``ordinal`` and ``**fields`` are passed failing to raise ``ValueError`` (:issue:`55961`)
- Bug in :class:`Period` addition silently wrapping around instead of raising ``OverflowError`` (:issue:`55503`)
- Bug in casting from :class:`PeriodDtype` with ``astype`` to ``datetime64`` or :class:`DatetimeTZDtype` with non-nanosecond unit incorrectly returning with nanosecond unit (:issue:`55958`)

Plotting
^^^^^^^^
- Bug in :meth:`DataFrame.plot.box` with ``vert=False`` and a Matplotlib ``Axes`` created with ``sharey=True`` (:issue:`54941`)
- Bug in :meth:`DataFrame.plot.scatter` discarding string columns (:issue:`56142`)
- Bug in :meth:`Series.plot` when reusing an ``ax`` object failing to raise when a ``how`` keyword is passed (:issue:`55953`)

Groupby/resample/rolling
^^^^^^^^^^^^^^^^^^^^^^^^
- Bug in :meth:`.DataFrameGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmin`, and :meth:`.SeriesGroupBy.idxmax` would not retain :class:`.Categorical` dtype when the index was a :class:`.CategoricalIndex` that contained NA values (:issue:`54234`)
- Bug in :meth:`.DataFrameGroupBy.transform` and :meth:`.SeriesGroupBy.transform` when ``observed=False`` and ``f="idxmin"`` or ``f="idxmax"`` would incorrectly raise on unobserved categories (:issue:`54234`)
- Bug in :meth:`.DataFrameGroupBy.value_counts` and :meth:`.SeriesGroupBy.value_counts` could result in incorrect sorting if the columns of the DataFrame or name of the Series are integers (:issue:`55951`)
- Bug in :meth:`.DataFrameGroupBy.value_counts` and :meth:`.SeriesGroupBy.value_counts` would not respect ``sort=False`` in :meth:`DataFrame.groupby` and :meth:`Series.groupby` (:issue:`55951`)
- Bug in :meth:`.DataFrameGroupBy.value_counts` and :meth:`.SeriesGroupBy.value_counts` would sort by proportions rather than frequencies when ``sort=True`` and ``normalize=True`` (:issue:`55951`)
- Bug in :meth:`DataFrame.asfreq` and :meth:`Series.asfreq` with a :class:`DatetimeIndex` with non-nanosecond resolution incorrectly converting to nanosecond resolution (:issue:`55958`)
- Bug in :meth:`DataFrame.ewm` when passed ``times`` with non-nanosecond ``datetime64`` or :class:`DatetimeTZDtype` dtype (:issue:`56262`)
- Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` where grouping by a combination of ``Decimal`` and NA values would fail when ``sort=True`` (:issue:`54847`)
- Bug in :meth:`DataFrame.groupby` for DataFrame subclasses when selecting a subset of columns to apply the function to (:issue:`56761`)
- Bug in :meth:`DataFrame.resample` not respecting ``closed`` and ``label`` arguments for :class:`~pandas.tseries.offsets.BusinessDay` (:issue:`55282`)
- Bug in :meth:`DataFrame.resample` when resampling on a :class:`ArrowDtype` of ``pyarrow.timestamp`` or ``pyarrow.duration`` type (:issue:`55989`)
- Bug in :meth:`DataFrame.resample` where bin edges were not correct for :class:`~pandas.tseries.offsets.BusinessDay` (:issue:`55281`)
- Bug in :meth:`DataFrame.resample` where bin edges were not correct for :class:`~pandas.tseries.offsets.MonthBegin` (:issue:`55271`)
- Bug in :meth:`DataFrame.rolling` and :meth:`Series.rolling` where duplicate datetimelike indexes are treated as consecutive rather than equal with ``closed='left'`` and ``closed='neither'`` (:issue:`20712`)
- Bug in :meth:`DataFrame.rolling` and :meth:`Series.rolling` where either the ``index`` or ``on`` column was :class:`ArrowDtype` with ``pyarrow.timestamp`` type (:issue:`55849`)

Reshaping
^^^^^^^^^
- Bug in :func:`concat` ignoring ``sort`` parameter when passed :class:`DatetimeIndex` indexes (:issue:`54769`)
- Bug in :func:`concat` renaming :class:`Series` when ``ignore_index=False`` (:issue:`15047`)
- Bug in :func:`merge_asof` raising ``TypeError`` when ``by`` dtype is not ``object``, ``int64``, or ``uint64`` (:issue:`22794`)
- Bug in :func:`merge_asof` raising incorrect error for string dtype (:issue:`56444`)
- Bug in :func:`merge_asof` when using a :class:`Timedelta` tolerance on a :class:`ArrowDtype` column (:issue:`56486`)
- Bug in :func:`merge` not raising when merging datetime columns with timedelta columns (:issue:`56455`)
- Bug in :func:`merge` not raising when merging string columns with numeric columns (:issue:`56441`)
- Bug in :func:`merge` not sorting for new string dtype (:issue:`56442`)
- Bug in :func:`merge` returning columns in incorrect order when left and/or right is empty (:issue:`51929`)
- Bug in :meth:`DataFrame.melt` where an exception was raised if ``var_name`` was not a string (:issue:`55948`)
- Bug in :meth:`DataFrame.melt` where it would not preserve the datetime (:issue:`55254`)
- Bug in :meth:`DataFrame.pivot_table` where the row margin is incorrect when the columns have numeric names (:issue:`26568`)
- Bug in :meth:`DataFrame.pivot` with numeric columns and extension dtype for data (:issue:`56528`)
- Bug in :meth:`DataFrame.stack` with ``future_stack=True`` would not preserve NA values in the index (:issue:`56573`)

Sparse
^^^^^^
- Bug in :meth:`arrays.SparseArray.take` when using a different fill value than the array's fill value (:issue:`55181`)

Other
^^^^^
- :meth:`DataFrame.__dataframe__` did not support pyarrow large strings (:issue:`56702`)
- Bug in :func:`DataFrame.describe` when formatting percentiles in the resulting percentile 99.999% is rounded to 100% (:issue:`55765`)
- Bug in :func:`api.interchange.from_dataframe` where it raised  ``NotImplementedError`` when handling empty string columns (:issue:`56703`)
- Bug in :func:`cut` and :func:`qcut` with ``datetime64`` dtype values with non-nanosecond units incorrectly returning nanosecond-unit bins (:issue:`56101`)
- Bug in :func:`cut` incorrectly allowing cutting of timezone-aware datetimes with timezone-naive bins (:issue:`54964`)
- Bug in :func:`infer_freq` and :meth:`DatetimeIndex.inferred_freq` with weekly frequencies and non-nanosecond resolutions (:issue:`55609`)
- Bug in :meth:`DataFrame.apply` where passing ``raw=True`` ignored ``args`` passed to the applied function (:issue:`55009`)
- Bug in :meth:`DataFrame.from_dict` which would always sort the rows of the created :class:`DataFrame`.  (:issue:`55683`)
- Bug in :meth:`DataFrame.sort_index` when passing ``axis="columns"`` and ``ignore_index=True`` raising a ``ValueError`` (:issue:`56478`)
- Bug in rendering ``inf`` values inside a :class:`DataFrame` with the ``use_inf_as_na`` option enabled (:issue:`55483`)
- Bug in rendering a :class:`Series` with a :class:`MultiIndex` when one of the index level's names is 0 not having that name displayed (:issue:`55415`)
- Bug in the error message when assigning an empty :class:`DataFrame` to a column (:issue:`55956`)
- Bug when time-like strings were being cast to :class:`ArrowDtype` with ``pyarrow.time64`` type (:issue:`56463`)
- Fixed a spurious deprecation warning from ``numba`` >= 0.58.0 when passing a numpy ufunc in :class:`core.window.Rolling.apply` with ``engine="numba"`` (:issue:`55247`)

.. ---------------------------------------------------------------------------
.. _whatsnew_220.contributors:

Contributors
~~~~~~~~~~~~

.. contributors:: v2.1.4..v2.2.0