File: merging.rst

package info (click to toggle)
pandas 2.2.3%2Bdfsg-9
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 66,784 kB
sloc: python: 422,228; ansic: 9,190; sh: 270; xml: 102; makefile: 83
file content (1095 lines) | stat: -rw-r--r-- 30,551 bytes
.. _merging:

{{ header }}

.. ipython:: python
   :suppress:

   from matplotlib import pyplot as plt
   import pandas.util._doctools as doctools

   p = doctools.TablePlotter()


************************************
Merge, join, concatenate and compare
************************************

pandas provides various methods for combining and comparing :class:`Series` or
:class:`DataFrame`.

* :func:`~pandas.concat`: Merge multiple :class:`Series` or :class:`DataFrame` objects along a shared index or column
* :meth:`DataFrame.join`: Merge multiple :class:`DataFrame` objects along the columns
* :meth:`DataFrame.combine_first`: Update missing values with non-missing values in the same location
* :func:`~pandas.merge`: Combine two :class:`Series` or :class:`DataFrame` objects with SQL-style joining
* :func:`~pandas.merge_ordered`: Combine two :class:`Series` or :class:`DataFrame` objects along an ordered axis
* :func:`~pandas.merge_asof`: Combine two :class:`Series` or :class:`DataFrame` objects by near instead of exact matching keys
* :meth:`Series.compare` and :meth:`DataFrame.compare`: Show differences in values between two :class:`Series` or :class:`DataFrame` objects

.. _merging.concat:

:func:`~pandas.concat`
----------------------

The :func:`~pandas.concat` function concatenates an arbitrary amount of
:class:`Series` or :class:`DataFrame` objects along an axis while
performing optional set logic (union or intersection) of the indexes on
the other axes. Like ``numpy.concatenate``, :func:`~pandas.concat`
takes a list or dict of homogeneously-typed objects and concatenates them.

.. ipython:: python

   df1 = pd.DataFrame(
       {
           "A": ["A0", "A1", "A2", "A3"],
           "B": ["B0", "B1", "B2", "B3"],
           "C": ["C0", "C1", "C2", "C3"],
           "D": ["D0", "D1", "D2", "D3"],
       },
       index=[0, 1, 2, 3],
   )

   df2 = pd.DataFrame(
       {
           "A": ["A4", "A5", "A6", "A7"],
           "B": ["B4", "B5", "B6", "B7"],
           "C": ["C4", "C5", "C6", "C7"],
           "D": ["D4", "D5", "D6", "D7"],
       },
       index=[4, 5, 6, 7],
   )

   df3 = pd.DataFrame(
       {
           "A": ["A8", "A9", "A10", "A11"],
           "B": ["B8", "B9", "B10", "B11"],
           "C": ["C8", "C9", "C10", "C11"],
           "D": ["D8", "D9", "D10", "D11"],
       },
       index=[8, 9, 10, 11],
   )

   frames = [df1, df2, df3]
   result = pd.concat(frames)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_basic.png
   p.plot(frames, result, labels=["df1", "df2", "df3"], vertical=True);
   plt.close("all");

.. note::

   :func:`~pandas.concat` makes a full copy of the data, and iteratively
   reusing :func:`~pandas.concat` can create unnecessary copies. Collect all
   :class:`DataFrame` or :class:`Series` objects in a list before using
   :func:`~pandas.concat`.

   .. code-block:: python

      frames = [process_your_file(f) for f in files]
      result = pd.concat(frames)

.. note::

   When concatenating :class:`DataFrame` with named axes, pandas will attempt to preserve
   these index/column names whenever possible. In the case where all inputs share a
   common name, this name will be assigned to the result. When the input names do
   not all agree, the result will be unnamed. The same is true for :class:`MultiIndex`,
   but the logic is applied separately on a level-by-level basis.


Joining logic of the resulting axis
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``join`` keyword specifies how to handle axis values that don't exist in the first
:class:`DataFrame`.

``join='outer'`` takes the union of all axis values

.. ipython:: python

   df4 = pd.DataFrame(
       {
           "B": ["B2", "B3", "B6", "B7"],
           "D": ["D2", "D3", "D6", "D7"],
           "F": ["F2", "F3", "F6", "F7"],
       },
       index=[2, 3, 6, 7],
   )
   result = pd.concat([df1, df4], axis=1)
   result


.. ipython:: python
   :suppress:

   @savefig merging_concat_axis1.png
   p.plot([df1, df4], result, labels=["df1", "df4"], vertical=False);
   plt.close("all");

``join='inner'`` takes the intersection of the axis values

.. ipython:: python

   result = pd.concat([df1, df4], axis=1, join="inner")
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_axis1_inner.png
   p.plot([df1, df4], result, labels=["df1", "df4"], vertical=False);
   plt.close("all");

To perform an effective "left" join using the *exact index* from the original
:class:`DataFrame`, result can be reindexed.

.. ipython:: python

   result = pd.concat([df1, df4], axis=1).reindex(df1.index)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_axis1_join_axes.png
   p.plot([df1, df4], result, labels=["df1", "df4"], vertical=False);
   plt.close("all");

.. _merging.ignore_index:

Ignoring indexes on the concatenation axis
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For :class:`DataFrame` objects which don't have a meaningful index, the ``ignore_index``
ignores overlapping indexes.

.. ipython:: python

   result = pd.concat([df1, df4], ignore_index=True, sort=False)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_ignore_index.png
   p.plot([df1, df4], result, labels=["df1", "df4"], vertical=True);
   plt.close("all");

.. _merging.mixed_ndims:

Concatenating :class:`Series` and :class:`DataFrame` together
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can concatenate a mix of :class:`Series` and :class:`DataFrame` objects. The
:class:`Series` will be transformed to :class:`DataFrame` with the column name as
the name of the :class:`Series`.

.. ipython:: python

   s1 = pd.Series(["X0", "X1", "X2", "X3"], name="X")
   result = pd.concat([df1, s1], axis=1)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_mixed_ndim.png
   p.plot([df1, s1], result, labels=["df1", "s1"], vertical=False);
   plt.close("all");

Unnamed :class:`Series` will be numbered consecutively.

.. ipython:: python

   s2 = pd.Series(["_0", "_1", "_2", "_3"])
   result = pd.concat([df1, s2, s2, s2], axis=1)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_unnamed_series.png
   p.plot([df1, s2], result, labels=["df1", "s2"], vertical=False);
   plt.close("all");

``ignore_index=True`` will drop all name references.

.. ipython:: python

   result = pd.concat([df1, s1], axis=1, ignore_index=True)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_series_ignore_index.png
   p.plot([df1, s1], result, labels=["df1", "s1"], vertical=False);
   plt.close("all");

Resulting ``keys``
~~~~~~~~~~~~~~~~~~

The ``keys`` argument adds another axis level to the resulting index or column (creating
a :class:`MultiIndex`) associate specific keys with each original :class:`DataFrame`.

.. ipython:: python

   result = pd.concat(frames, keys=["x", "y", "z"])
   result
   result.loc["y"]

.. ipython:: python
   :suppress:

   @savefig merging_concat_keys.png
   p.plot(frames, result, labels=["df1", "df2", "df3"], vertical=True)
   plt.close("all");

The ``keys`` argument cane override the column names
when creating a new :class:`DataFrame` based on existing :class:`Series`.

.. ipython:: python

   s3 = pd.Series([0, 1, 2, 3], name="foo")
   s4 = pd.Series([0, 1, 2, 3])
   s5 = pd.Series([0, 1, 4, 5])

   pd.concat([s3, s4, s5], axis=1)
   pd.concat([s3, s4, s5], axis=1, keys=["red", "blue", "yellow"])

You can also pass a dict to :func:`concat` in which case the dict keys will be used
for the ``keys`` argument unless other ``keys`` argument is specified:

.. ipython:: python

   pieces = {"x": df1, "y": df2, "z": df3}
   result = pd.concat(pieces)
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_dict.png
   p.plot([df1, df2, df3], result, labels=["df1", "df2", "df3"], vertical=True);
   plt.close("all");

.. ipython:: python

   result = pd.concat(pieces, keys=["z", "y"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_dict_keys.png
   p.plot([df1, df2, df3], result, labels=["df1", "df2", "df3"], vertical=True);
   plt.close("all");

The :class:`MultiIndex` created has levels that are constructed from the passed keys and
the index of the :class:`DataFrame` pieces:

.. ipython:: python

   result.index.levels

``levels`` argument allows specifying resulting levels associated with the ``keys``

.. ipython:: python

   result = pd.concat(
       pieces, keys=["x", "y", "z"], levels=[["z", "y", "x", "w"]], names=["group_key"]
   )
   result

.. ipython:: python
   :suppress:

   @savefig merging_concat_dict_keys_names.png
   p.plot([df1, df2, df3], result, labels=["df1", "df2", "df3"], vertical=True);
   plt.close("all");

.. ipython:: python

   result.index.levels

.. _merging.append.row:

Appending rows to a :class:`DataFrame`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you have a :class:`Series` that you want to append as a single row to a :class:`DataFrame`, you can convert the row into a
:class:`DataFrame` and use :func:`concat`

.. ipython:: python

   s2 = pd.Series(["X0", "X1", "X2", "X3"], index=["A", "B", "C", "D"])
   result = pd.concat([df1, s2.to_frame().T], ignore_index=True)
   result

.. ipython:: python
   :suppress:

   @savefig merging_append_series_as_row.png
   p.plot([df1, s2], result, labels=["df1", "s2"], vertical=True);
   plt.close("all");

.. _merging.join:

:func:`~pandas.merge`
---------------------

:func:`~pandas.merge` performs join operations similar to relational databases like SQL.
Users who are familiar with SQL but new to pandas can reference a
:ref:`comparison with SQL<compare_with_sql.join>`.

Merge types
~~~~~~~~~~~

:func:`~pandas.merge` implements common SQL style joining operations.

* **one-to-one**: joining two :class:`DataFrame` objects on
  their indexes which must contain unique values.
* **many-to-one**: joining a unique index to one or
  more columns in a different :class:`DataFrame`.
* **many-to-many** : joining columns on columns.

.. note::

   When joining columns on columns, potentially a many-to-many join, any
   indexes on the passed :class:`DataFrame` objects **will be discarded**.


For a **many-to-many** join, if a key combination appears
more than once in both tables, the :class:`DataFrame` will have the **Cartesian
product** of the associated data.

.. ipython:: python

   left = pd.DataFrame(
       {
           "key": ["K0", "K1", "K2", "K3"],
           "A": ["A0", "A1", "A2", "A3"],
           "B": ["B0", "B1", "B2", "B3"],
       }
   )

   right = pd.DataFrame(
       {
           "key": ["K0", "K1", "K2", "K3"],
           "C": ["C0", "C1", "C2", "C3"],
           "D": ["D0", "D1", "D2", "D3"],
       }
   )
   result = pd.merge(left, right, on="key")
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_on_key.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

The ``how`` argument to :func:`~pandas.merge` specifies which keys are
included in the resulting table. If a key combination **does not appear** in
either the left or right tables, the values in the joined table will be
``NA``. Here is a summary of the ``how`` options and their SQL equivalent names:

.. csv-table::
    :header: "Merge method", "SQL Join Name", "Description"
    :widths: 20, 20, 60

    ``left``, ``LEFT OUTER JOIN``, Use keys from left frame only
    ``right``, ``RIGHT OUTER JOIN``, Use keys from right frame only
    ``outer``, ``FULL OUTER JOIN``, Use union of keys from both frames
    ``inner``, ``INNER JOIN``, Use intersection of keys from both frames
    ``cross``, ``CROSS JOIN``, Create the cartesian product of rows of both frames

.. ipython:: python

   left = pd.DataFrame(
      {
         "key1": ["K0", "K0", "K1", "K2"],
         "key2": ["K0", "K1", "K0", "K1"],
         "A": ["A0", "A1", "A2", "A3"],
         "B": ["B0", "B1", "B2", "B3"],
      }
   )
   right = pd.DataFrame(
      {
         "key1": ["K0", "K1", "K1", "K2"],
         "key2": ["K0", "K0", "K0", "K0"],
         "C": ["C0", "C1", "C2", "C3"],
         "D": ["D0", "D1", "D2", "D3"],
      }
   )
   result = pd.merge(left, right, how="left", on=["key1", "key2"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_on_key_left.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = pd.merge(left, right, how="right", on=["key1", "key2"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_on_key_right.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);

.. ipython:: python

   result = pd.merge(left, right, how="outer", on=["key1", "key2"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_on_key_outer.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = pd.merge(left, right, how="inner", on=["key1", "key2"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_on_key_inner.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = pd.merge(left, right, how="cross")
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_cross.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

You can :class:`Series` and a :class:`DataFrame` with a :class:`MultiIndex` if the names of
the :class:`MultiIndex` correspond to the columns from the :class:`DataFrame`. Transform
the :class:`Series` to a :class:`DataFrame` using :meth:`Series.reset_index` before merging

.. ipython:: python

   df = pd.DataFrame({"Let": ["A", "B", "C"], "Num": [1, 2, 3]})
   df

   ser = pd.Series(
       ["a", "b", "c", "d", "e", "f"],
       index=pd.MultiIndex.from_arrays(
           [["A", "B", "C"] * 2, [1, 2, 3, 4, 5, 6]], names=["Let", "Num"]
       ),
   )
   ser

   pd.merge(df, ser.reset_index(), on=["Let", "Num"])


Performing an outer join with duplicate join keys in :class:`DataFrame`

.. ipython:: python

   left = pd.DataFrame({"A": [1, 2], "B": [2, 2]})

   right = pd.DataFrame({"A": [4, 5, 6], "B": [2, 2, 2]})

   result = pd.merge(left, right, on="B", how="outer")
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_on_key_dup.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");


.. warning::

  Merging on duplicate keys significantly increase the dimensions of the result
  and can cause a memory overflow.

.. _merging.validation:

Merge key uniqueness
~~~~~~~~~~~~~~~~~~~~

The ``validate`` argument checks whether the uniqueness of merge keys.
Key uniqueness is checked before merge operations and can protect against memory overflows
and unexpected key duplication.

.. ipython:: python
   :okexcept:

   left = pd.DataFrame({"A": [1, 2], "B": [1, 2]})
   right = pd.DataFrame({"A": [4, 5, 6], "B": [2, 2, 2]})
   result = pd.merge(left, right, on="B", how="outer", validate="one_to_one")

If the user is aware of the duplicates in the right :class:`DataFrame` but wants to
ensure there are no duplicates in the left :class:`DataFrame`, one can use the
``validate='one_to_many'`` argument instead, which will not raise an exception.

.. ipython:: python

   pd.merge(left, right, on="B", how="outer", validate="one_to_many")


.. _merging.indicator:

Merge result indicator
~~~~~~~~~~~~~~~~~~~~~~

:func:`~pandas.merge` accepts the argument ``indicator``. If ``True``, a
Categorical-type column called ``_merge`` will be added to the output object
that takes on values:

  ===================================   ================
  Observation Origin                    ``_merge`` value
  ===================================   ================
  Merge key only in ``'left'`` frame    ``left_only``
  Merge key only in ``'right'`` frame   ``right_only``
  Merge key in both frames              ``both``
  ===================================   ================

.. ipython:: python

   df1 = pd.DataFrame({"col1": [0, 1], "col_left": ["a", "b"]})
   df2 = pd.DataFrame({"col1": [1, 2, 2], "col_right": [2, 2, 2]})
   pd.merge(df1, df2, on="col1", how="outer", indicator=True)

A string argument to ``indicator`` will use the value as the name for the indicator column.

.. ipython:: python

   pd.merge(df1, df2, on="col1", how="outer", indicator="indicator_column")


Overlapping value columns
~~~~~~~~~~~~~~~~~~~~~~~~~

The merge ``suffixes`` argument takes a tuple of list of strings to append to
overlapping column names in the input :class:`DataFrame` to disambiguate the result
columns:

.. ipython:: python

   left = pd.DataFrame({"k": ["K0", "K1", "K2"], "v": [1, 2, 3]})
   right = pd.DataFrame({"k": ["K0", "K0", "K3"], "v": [4, 5, 6]})

   result = pd.merge(left, right, on="k")
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_overlapped.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = pd.merge(left, right, on="k", suffixes=("_l", "_r"))
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_overlapped_suffix.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

:meth:`DataFrame.join`
----------------------

:meth:`DataFrame.join` combines the columns of multiple,
potentially differently-indexed :class:`DataFrame` into a single result
:class:`DataFrame`.

.. ipython:: python

   left = pd.DataFrame(
       {"A": ["A0", "A1", "A2"], "B": ["B0", "B1", "B2"]}, index=["K0", "K1", "K2"]
   )

   right = pd.DataFrame(
       {"C": ["C0", "C2", "C3"], "D": ["D0", "D2", "D3"]}, index=["K0", "K2", "K3"]
   )

   result = left.join(right)
   result

.. ipython:: python
   :suppress:

   @savefig merging_join.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = left.join(right, how="outer")
   result

.. ipython:: python
   :suppress:

   @savefig merging_join_outer.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = left.join(right, how="inner")
   result

.. ipython:: python
   :suppress:

   @savefig merging_join_inner.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

:meth:`DataFrame.join` takes an optional ``on`` argument which may be a column
or multiple column names that the passed :class:`DataFrame` is to be
aligned.

.. ipython:: python

   left = pd.DataFrame(
       {
           "A": ["A0", "A1", "A2", "A3"],
           "B": ["B0", "B1", "B2", "B3"],
           "key": ["K0", "K1", "K0", "K1"],
       }
   )

   right = pd.DataFrame({"C": ["C0", "C1"], "D": ["D0", "D1"]}, index=["K0", "K1"])

   result = left.join(right, on="key")
   result

.. ipython:: python
   :suppress:

   @savefig merging_join_key_columns.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. ipython:: python

   result = pd.merge(
       left, right, left_on="key", right_index=True, how="left", sort=False
   )
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_key_columns.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. _merging.multikey_join:

To join on multiple keys, the passed :class:`DataFrame` must have a :class:`MultiIndex`:

.. ipython:: python

   left = pd.DataFrame(
       {
           "A": ["A0", "A1", "A2", "A3"],
           "B": ["B0", "B1", "B2", "B3"],
           "key1": ["K0", "K0", "K1", "K2"],
           "key2": ["K0", "K1", "K0", "K1"],
       }
   )

   index = pd.MultiIndex.from_tuples(
       [("K0", "K0"), ("K1", "K0"), ("K2", "K0"), ("K2", "K1")]
   )
   right = pd.DataFrame(
       {"C": ["C0", "C1", "C2", "C3"], "D": ["D0", "D1", "D2", "D3"]}, index=index
   )
   result = left.join(right, on=["key1", "key2"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_join_multikeys.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. _merging.df_inner_join:

The default for :class:`DataFrame.join` is to perform a left join
which uses only the keys found in the
calling :class:`DataFrame`. Other join types can be specified with ``how``.

.. ipython:: python

   result = left.join(right, on=["key1", "key2"], how="inner")
   result

.. ipython:: python
   :suppress:

   @savefig merging_join_multikeys_inner.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. _merging.join_on_mi:

Joining a single Index to a MultiIndex
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can join a :class:`DataFrame` with a :class:`Index` to a :class:`DataFrame` with a :class:`MultiIndex` on a level.
The ``name`` of the :class:`Index` with match the level name of the :class:`MultiIndex`.

..  ipython:: python

    left = pd.DataFrame(
        {"A": ["A0", "A1", "A2"], "B": ["B0", "B1", "B2"]},
        index=pd.Index(["K0", "K1", "K2"], name="key"),
    )

    index = pd.MultiIndex.from_tuples(
        [("K0", "Y0"), ("K1", "Y1"), ("K2", "Y2"), ("K2", "Y3")],
        names=["key", "Y"],
    )
    right = pd.DataFrame(
        {"C": ["C0", "C1", "C2", "C3"], "D": ["D0", "D1", "D2", "D3"]},
        index=index,
    )

    result = left.join(right, how="inner")
    result


.. ipython:: python
   :suppress:

   @savefig merging_join_multiindex_inner.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. _merging.join_with_two_multi_indexes:

Joining with two :class:`MultiIndex`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The :class:`MultiIndex` of the input argument must be completely used
in the join and is a subset of the indices in the left argument.

.. ipython:: python

   leftindex = pd.MultiIndex.from_product(
       [list("abc"), list("xy"), [1, 2]], names=["abc", "xy", "num"]
   )
   left = pd.DataFrame({"v1": range(12)}, index=leftindex)
   left

   rightindex = pd.MultiIndex.from_product(
       [list("abc"), list("xy")], names=["abc", "xy"]
   )
   right = pd.DataFrame({"v2": [100 * i for i in range(1, 7)]}, index=rightindex)
   right

   left.join(right, on=["abc", "xy"], how="inner")

.. ipython:: python

   leftindex = pd.MultiIndex.from_tuples(
       [("K0", "X0"), ("K0", "X1"), ("K1", "X2")], names=["key", "X"]
   )
   left = pd.DataFrame(
       {"A": ["A0", "A1", "A2"], "B": ["B0", "B1", "B2"]}, index=leftindex
   )

   rightindex = pd.MultiIndex.from_tuples(
       [("K0", "Y0"), ("K1", "Y1"), ("K2", "Y2"), ("K2", "Y3")], names=["key", "Y"]
   )
   right = pd.DataFrame(
       {"C": ["C0", "C1", "C2", "C3"], "D": ["D0", "D1", "D2", "D3"]}, index=rightindex
   )

   result = pd.merge(
       left.reset_index(), right.reset_index(), on=["key"], how="inner"
   ).set_index(["key", "X", "Y"])
   result

.. ipython:: python
   :suppress:

   @savefig merging_merge_two_multiindex.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. _merging.merge_on_columns_and_levels:

Merging on a combination of columns and index levels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Strings passed as the ``on``, ``left_on``, and ``right_on`` parameters
may refer to either column names or index level names.  This enables merging
:class:`DataFrame` instances on a combination of index levels and columns without
resetting indexes.

.. ipython:: python

   left_index = pd.Index(["K0", "K0", "K1", "K2"], name="key1")

   left = pd.DataFrame(
       {
           "A": ["A0", "A1", "A2", "A3"],
           "B": ["B0", "B1", "B2", "B3"],
           "key2": ["K0", "K1", "K0", "K1"],
       },
       index=left_index,
   )

   right_index = pd.Index(["K0", "K1", "K2", "K2"], name="key1")

   right = pd.DataFrame(
       {
           "C": ["C0", "C1", "C2", "C3"],
           "D": ["D0", "D1", "D2", "D3"],
           "key2": ["K0", "K0", "K0", "K1"],
       },
       index=right_index,
   )

   result = left.merge(right, on=["key1", "key2"])
   result

.. ipython:: python
   :suppress:

   @savefig merge_on_index_and_column.png
   p.plot([left, right], result, labels=["left", "right"], vertical=False);
   plt.close("all");

.. note::

   When :class:`DataFrame` are joined on a string that matches an index level in both
   arguments, the index level is preserved as an index level in the resulting
   :class:`DataFrame`.

.. note::

   When :class:`DataFrame` are joined using only some of the levels of a :class:`MultiIndex`,
   the extra levels will be dropped from the resulting join. To
   preserve those levels, use :meth:`DataFrame.reset_index` on those level
   names to move those levels to columns prior to the join.

.. _merging.multiple_join:

Joining multiple :class:`DataFrame`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A list or tuple of ``:class:`DataFrame``` can also be passed to :meth:`~DataFrame.join`
to join them together on their indexes.

.. ipython:: python

   right2 = pd.DataFrame({"v": [7, 8, 9]}, index=["K1", "K1", "K2"])
   result = left.join([right, right2])

.. ipython:: python
   :suppress:

   @savefig merging_join_multi_df.png
   p.plot(
       [left, right, right2],
       result,
       labels=["left", "right", "right2"],
       vertical=False,
   );
   plt.close("all");

.. _merging.combine_first.update:

:meth:`DataFrame.combine_first`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:meth:`DataFrame.combine_first` update missing values from one :class:`DataFrame`
with the non-missing values in another :class:`DataFrame` in the corresponding
location.

.. ipython:: python

   df1 = pd.DataFrame(
       [[np.nan, 3.0, 5.0], [-4.6, np.nan, np.nan], [np.nan, 7.0, np.nan]]
   )
   df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5.0, 1.6, 4]], index=[1, 2])
   result = df1.combine_first(df2)
   result

.. ipython:: python
   :suppress:

   @savefig merging_combine_first.png
   p.plot([df1, df2], result, labels=["df1", "df2"], vertical=False);
   plt.close("all");

.. _merging.merge_ordered:

:func:`merge_ordered`
---------------------

:func:`merge_ordered` combines order data such as numeric or time series data
with optional filling of missing data with ``fill_method``.

.. ipython:: python

   left = pd.DataFrame(
       {"k": ["K0", "K1", "K1", "K2"], "lv": [1, 2, 3, 4], "s": ["a", "b", "c", "d"]}
   )

   right = pd.DataFrame({"k": ["K1", "K2", "K4"], "rv": [1, 2, 3]})

   pd.merge_ordered(left, right, fill_method="ffill", left_by="s")

.. _merging.merge_asof:

:func:`merge_asof`
---------------------

:func:`merge_asof` is similar to an ordered left-join except that mactches are on the
nearest key rather than equal keys. For each row in the ``left`` :class:`DataFrame`,
the last row in the ``right`` :class:`DataFrame` are selected where the ``on`` key is less
than the left's key. Both :class:`DataFrame` must be sorted by the key.

Optionally an :func:`merge_asof` can perform a group-wise merge by matching the
``by`` key in addition to the nearest match on the ``on`` key.

.. ipython:: python

   trades = pd.DataFrame(
       {
           "time": pd.to_datetime(
               [
                   "20160525 13:30:00.023",
                   "20160525 13:30:00.038",
                   "20160525 13:30:00.048",
                   "20160525 13:30:00.048",
                   "20160525 13:30:00.048",
               ]
           ),
           "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
           "price": [51.95, 51.95, 720.77, 720.92, 98.00],
           "quantity": [75, 155, 100, 100, 100],
       },
       columns=["time", "ticker", "price", "quantity"],
   )

   quotes = pd.DataFrame(
       {
           "time": pd.to_datetime(
               [
                   "20160525 13:30:00.023",
                   "20160525 13:30:00.023",
                   "20160525 13:30:00.030",
                   "20160525 13:30:00.041",
                   "20160525 13:30:00.048",
                   "20160525 13:30:00.049",
                   "20160525 13:30:00.072",
                   "20160525 13:30:00.075",
               ]
           ),
           "ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG", "AAPL", "GOOG", "MSFT"],
           "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
           "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03],
       },
       columns=["time", "ticker", "bid", "ask"],
   )
   trades
   quotes
   pd.merge_asof(trades, quotes, on="time", by="ticker")

:func:`merge_asof` within ``2ms`` between the quote time and the trade time.

.. ipython:: python

   pd.merge_asof(trades, quotes, on="time", by="ticker", tolerance=pd.Timedelta("2ms"))

:func:`merge_asof` within ``10ms`` between the quote time and the trade time and
exclude exact matches on time. Note that though we exclude the exact matches
(of the quotes), prior quotes **do** propagate to that point in time.

.. ipython:: python

   pd.merge_asof(
       trades,
       quotes,
       on="time",
       by="ticker",
       tolerance=pd.Timedelta("10ms"),
       allow_exact_matches=False,
   )

.. _merging.compare:

:meth:`~Series.compare`
-----------------------

The :meth:`Series.compare` and :meth:`DataFrame.compare` methods allow you to
compare two :class:`DataFrame` or :class:`Series`, respectively, and summarize their differences.

.. ipython:: python

   df = pd.DataFrame(
       {
           "col1": ["a", "a", "b", "b", "a"],
           "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
           "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
       },
       columns=["col1", "col2", "col3"],
   )
   df
   df2 = df.copy()
   df2.loc[0, "col1"] = "c"
   df2.loc[2, "col3"] = 4.0
   df2
   df.compare(df2)

By default, if two corresponding values are equal, they will be shown as ``NaN``.
Furthermore, if all values in an entire row / column, the row / column will be
omitted from the result. The remaining differences will be aligned on columns.

Stack the differences on rows.

.. ipython:: python

   df.compare(df2, align_axis=0)

Keep all original rows and columns with ``keep_shape=True``

.. ipython:: python

   df.compare(df2, keep_shape=True)

Keep all the original values even if they are equal.

.. ipython:: python

   df.compare(df2, keep_shape=True, keep_equal=True)