1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165
|
.. currentmodule:: geopandas
.. ipython:: python
:suppress:
import geopandas
import pandas as pd
Merging data
=========================================
There are two ways to combine datasets in GeoPandas -- attribute joins and spatial joins.
In an attribute join, a :class:`GeoSeries` or :class:`GeoDataFrame` is
combined with a regular :class:`pandas.Series` or :class:`pandas.DataFrame` based on a
common variable. This is analogous to normal merging or joining in *pandas*.
In a spatial join, observations from two :class:`GeoSeries` or :class:`GeoDataFrame`
are combined based on their spatial relationship to one another.
In the following examples, these datasets are used:
.. ipython:: python
import geodatasets
chicago = geopandas.read_file(geodatasets.get_path("geoda.chicago_commpop"))
groceries = geopandas.read_file(geodatasets.get_path("geoda.groceries"))
# For attribute join
chicago_shapes = chicago[['geometry', 'NID']]
chicago_names = chicago[['community', 'NID']]
# For spatial join
chicago = chicago[['geometry', 'community']].to_crs(groceries.crs)
Appending
---------
Appending :class:`GeoDataFrame` and :class:`GeoSeries` uses pandas :func:`~pandas.concat` function.
Keep in mind, that appended geometry columns needs to have the same CRS.
.. ipython:: python
# Appending GeoSeries
joined = pd.concat([chicago.geometry, groceries.geometry])
# Appending GeoDataFrames
douglas = chicago[chicago.community == 'DOUGLAS']
oakland = chicago[chicago.community == 'OAKLAND']
douglas_oakland = pd.concat([douglas, oakland])
Attribute joins
----------------
Attribute joins are accomplished using the :meth:`~pandas.DataFrame.merge` method. In general, it is recommended
to use the ``merge()`` method called from the spatial dataset. With that said, the stand-alone
:func:`pandas.merge` function will work if the :class:`GeoDataFrame` is in the ``left`` argument;
if a :class:`~pandas.DataFrame` is in the ``left`` argument and a :class:`GeoDataFrame`
is in the ``right`` position, the result will no longer be a :class:`GeoDataFrame`.
For example, consider the following merge that adds full names to a :class:`GeoDataFrame`
that initially has only area ID for each geometry by merging it with a :class:`~pandas.DataFrame`.
.. ipython:: python
# `chicago_shapes` is GeoDataFrame with community shapes and area IDs
chicago_shapes.head()
# `chicago_names` is DataFrame with community names and area ID
chicago_names.head()
# Merge with `merge` method on shared variable (area ID):
chicago_shapes = chicago_shapes.merge(chicago_names, on='NID')
chicago_shapes.head()
.. _mergingdata.spatial-joins:
Spatial joins
----------------
In a spatial join, two geometry objects are merged based on their spatial relationship to one another.
.. ipython:: python
# One GeoDataFrame of communities, one of grocery stores.
# Want to merge to get each grocery's community.
chicago.head()
groceries.head()
# Execute spatial join
groceries_with_community = groceries.sjoin(chicago, how="inner", predicate='intersects')
groceries_with_community.head()
GeoPandas provides two spatial-join functions:
- :meth:`GeoDataFrame.sjoin`: joins based on binary predicates (intersects, contains, etc.)
- :meth:`GeoDataFrame.sjoin_nearest`: joins based on proximity, with the ability to set a maximum search radius.
.. note::
For historical reasons, both methods are also available as top-level functions :func:`sjoin` and :func:`sjoin_nearest`.
It is recommended to use methods as the functions may be deprecated in the future.
Binary predicate joins
~~~~~~~~~~~~~~~~~~~~~~
Binary predicate joins are available via :meth:`GeoDataFrame.sjoin`.
:meth:`GeoDataFrame.sjoin` has two core arguments: ``how`` and ``predicate``.
**predicate**
The ``predicate`` argument specifies how GeoPandas decides whether or not to join the attributes of one
object to another, based on their geometric relationship.
The values for ``predicate`` correspond to the names of geometric binary predicates and depend on the spatial
index implementation.
The default spatial index in GeoPandas currently supports the following values for ``predicate`` which are
defined in the
`Shapely documentation <http://shapely.readthedocs.io/en/latest/manual.html#binary-predicates>`__:
* `intersects`
* `contains`
* `within`
* `touches`
* `crosses`
* `overlaps`
**how**
The `how` argument specifies the type of join that will occur and which geometry is retained in the resultant
:class:`GeoDataFrame`. It accepts the following options:
* ``left``: use the index from the first (or `left_df`) :class:`GeoDataFrame` that you provide
to :meth:`GeoDataFrame.sjoin`; retain only the `left_df` geometry column
* ``right``: use index from second (or `right_df`); retain only the `right_df` geometry column
* ``inner``: use intersection of index values from both :class:`GeoDataFrame`; retain only the `left_df` geometry column
Note more complicated spatial relationships can be studied by combining geometric operations with spatial join.
To find all polygons within a given distance of a point, for example, one can first use the :meth:`~geopandas.GeoSeries.buffer` method to expand each
point into a circle of appropriate radius, then intersect those buffered circles with the polygons in question.
Nearest joins
~~~~~~~~~~~~~
Proximity-based joins can be done via :meth:`GeoDataFrame.sjoin_nearest`.
:meth:`GeoDataFrame.sjoin_nearest` shares the ``how`` argument with :meth:`GeoDataFrame.sjoin`, and
includes two additional arguments: ``max_distance`` and ``distance_col``.
**max_distance**
The ``max_distance`` argument specifies a maximum search radius for matching geometries. This can have a considerable performance impact in some cases.
If you can, it is highly recommended that you use this parameter.
**distance_col**
If set, the resultant GeoDataFrame will include a column with this name containing the computed distances between an input geometry and the nearest geometry.
|