File: mergingdata.rst

package info (click to toggle)
python-geopandas 1.1.1-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 14,848 kB
  • sloc: python: 26,022; makefile: 147; sh: 25
file content (165 lines) | stat: -rw-r--r-- 6,090 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
.. currentmodule:: geopandas

.. ipython:: python
   :suppress:

   import geopandas
   import pandas as pd


Merging data
=========================================

There are two ways to combine datasets in GeoPandas -- attribute joins and spatial joins.

In an attribute join, a :class:`GeoSeries` or :class:`GeoDataFrame` is
combined with a regular :class:`pandas.Series` or :class:`pandas.DataFrame` based on a
common variable. This is analogous to normal merging or joining in *pandas*.

In a spatial join, observations from two :class:`GeoSeries` or :class:`GeoDataFrame`
are combined based on their spatial relationship to one another.

In the following examples, these datasets are used:

.. ipython:: python

   import geodatasets

   chicago = geopandas.read_file(geodatasets.get_path("geoda.chicago_commpop"))
   groceries = geopandas.read_file(geodatasets.get_path("geoda.groceries"))

   # For attribute join
   chicago_shapes = chicago[['geometry', 'NID']]
   chicago_names = chicago[['community', 'NID']]

   # For spatial join
   chicago = chicago[['geometry', 'community']].to_crs(groceries.crs)


Appending
---------

Appending :class:`GeoDataFrame` and :class:`GeoSeries` uses pandas :func:`~pandas.concat` function.
Keep in mind, that appended geometry columns needs to have the same CRS.

.. ipython:: python

    # Appending GeoSeries
    joined = pd.concat([chicago.geometry, groceries.geometry])

    # Appending GeoDataFrames
    douglas = chicago[chicago.community == 'DOUGLAS']
    oakland = chicago[chicago.community == 'OAKLAND']
    douglas_oakland = pd.concat([douglas, oakland])


Attribute joins
----------------

Attribute joins are accomplished using the :meth:`~pandas.DataFrame.merge` method. In general, it is recommended
to use the ``merge()`` method called from the spatial dataset. With that said, the stand-alone
:func:`pandas.merge` function will work if the :class:`GeoDataFrame` is in the ``left`` argument;
if a :class:`~pandas.DataFrame` is in the ``left`` argument and a :class:`GeoDataFrame`
is in the ``right`` position, the result will no longer be a :class:`GeoDataFrame`.

For example, consider the following merge that adds full names to a :class:`GeoDataFrame`
that initially has only area ID for each geometry by merging it with a :class:`~pandas.DataFrame`.

.. ipython:: python

   # `chicago_shapes` is GeoDataFrame with community shapes and area IDs
   chicago_shapes.head()

   # `chicago_names` is DataFrame with community names and area ID
   chicago_names.head()

   # Merge with `merge` method on shared variable (area ID):
   chicago_shapes = chicago_shapes.merge(chicago_names, on='NID')
   chicago_shapes.head()

.. _mergingdata.spatial-joins:

Spatial joins
----------------

In a spatial join, two geometry objects are merged based on their spatial relationship to one another.

.. ipython:: python


   # One GeoDataFrame of communities, one of grocery stores.
   # Want to merge to get each grocery's community.
   chicago.head()
   groceries.head()

   # Execute spatial join

   groceries_with_community = groceries.sjoin(chicago, how="inner", predicate='intersects')
   groceries_with_community.head()


GeoPandas provides two spatial-join functions:

- :meth:`GeoDataFrame.sjoin`: joins based on binary predicates (intersects, contains, etc.)
- :meth:`GeoDataFrame.sjoin_nearest`: joins based on proximity, with the ability to set a maximum search radius.

.. note::
   For historical reasons, both methods are also available as top-level functions :func:`sjoin` and :func:`sjoin_nearest`.
   It is recommended to use methods as the functions may be deprecated in the future.

Binary predicate joins
~~~~~~~~~~~~~~~~~~~~~~

Binary predicate joins are available via :meth:`GeoDataFrame.sjoin`.

:meth:`GeoDataFrame.sjoin` has two core arguments: ``how`` and ``predicate``.

**predicate**

The ``predicate`` argument specifies how GeoPandas decides whether or not to join the attributes of one
object to another, based on their geometric relationship.

The values for ``predicate`` correspond to the names of geometric binary predicates and depend on the spatial
index implementation.

The default spatial index in GeoPandas currently supports the following values for ``predicate`` which are
defined in the
`Shapely documentation <http://shapely.readthedocs.io/en/latest/manual.html#binary-predicates>`__:

* `intersects`
* `contains`
* `within`
* `touches`
* `crosses`
* `overlaps`

**how**

The `how` argument specifies the type of join that will occur and which geometry is retained in the resultant
:class:`GeoDataFrame`. It accepts the following options:

* ``left``: use the index from the first (or `left_df`) :class:`GeoDataFrame` that you provide
  to :meth:`GeoDataFrame.sjoin`; retain only the `left_df` geometry column
* ``right``: use index from second (or `right_df`); retain only the `right_df` geometry column
* ``inner``: use intersection of index values from both :class:`GeoDataFrame`; retain only the `left_df` geometry column

Note more complicated spatial relationships can be studied by combining geometric operations with spatial join.
To find all polygons within a given distance of a point, for example, one can first use the :meth:`~geopandas.GeoSeries.buffer` method to expand each
point into a circle of appropriate radius, then intersect those buffered circles with the polygons in question.

Nearest joins
~~~~~~~~~~~~~

Proximity-based joins can be done via :meth:`GeoDataFrame.sjoin_nearest`.

:meth:`GeoDataFrame.sjoin_nearest` shares the ``how`` argument with :meth:`GeoDataFrame.sjoin`, and
includes two additional arguments: ``max_distance`` and ``distance_col``.

**max_distance**

The ``max_distance`` argument specifies a maximum search radius for matching geometries. This can have a considerable performance impact in some cases.
If you can, it is highly recommended that you use this parameter.

**distance_col**

If set, the resultant GeoDataFrame will include a column with this name containing the computed distances between an input geometry and the nearest geometry.