File: set_operations.rst

package info (click to toggle)
python-geopandas 1.1.1-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 14,848 kB
  • sloc: python: 26,022; makefile: 147; sh: 25
file content (218 lines) | stat: -rw-r--r-- 7,973 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
.. currentmodule:: geopandas

.. ipython:: python
   :suppress:

   import geopandas
   import matplotlib.pyplot as plt
   plt.close('all')


Set operations with overlay
============================

When working with multiple spatial datasets -- especially multiple *polygon* or
*line* datasets -- users often wish to create new shapes based on places where
those datasets overlap (or don't overlap). These manipulations are often
referred using the language of sets -- intersections, unions, and differences.
These types of operations are made available in the GeoPandas library through
the :meth:`~geopandas.GeoDataFrame.overlay` method.

The basic idea is demonstrated by the graphic below but keep in mind that
overlays operate at the DataFrame level, not on individual geometries, and the
properties from both are retained. In effect, for every shape in the left
:class:`~geopandas.GeoDataFrame`, this operation is executed against every other shape in the right
:class:`~geopandas.GeoDataFrame`:

.. image:: ../../_static/overlay_operations.png

**Source: QGIS documentation**

.. note::
   Note to users familiar with the *shapely* library: :meth:`~geopandas.GeoDataFrame.overlay` can be thought
   of as offering versions of the standard *shapely* set operations that deal with
   the complexities of applying set operations to two *GeoSeries*. The standard
   *shapely* set operations are also available as :class:`~geopandas.GeoSeries` methods.


The different overlay operations
--------------------------------

First, create some example data:

.. ipython:: python

    from shapely.geometry import Polygon
    polys1 = geopandas.GeoSeries([Polygon([(0,0), (2,0), (2,2), (0,2)]),
                                  Polygon([(2,2), (4,2), (4,4), (2,4)])])
    polys2 = geopandas.GeoSeries([Polygon([(1,1), (3,1), (3,3), (1,3)]),
                                  Polygon([(3,3), (5,3), (5,5), (3,5)])])

    df1 = geopandas.GeoDataFrame({'geometry': polys1, 'df1':[1,2]})
    df2 = geopandas.GeoDataFrame({'geometry': polys2, 'df2':[1,2]})

These two GeoDataFrames have some overlapping areas:

.. ipython:: python

    ax = df1.plot(color='red');
    @savefig overlay_example.png width=5in
    df2.plot(ax=ax, color='green', alpha=0.5);

The above example illustrates the different overlay modes.
The :meth:`~geopandas.GeoDataFrame.overlay` method will determine the set of all individual geometries
from overlaying the two input GeoDataFrames. This result covers the area covered
by the two input GeoDataFrames, and also preserves all unique regions defined by
the combined boundaries of the two GeoDataFrames.

.. note::
   For historical reasons, the overlay method is also available as a top-level function :func:`overlay`.
   It is recommended to use the method as the function may be deprecated in the future.

When using ``how='union'``, all those possible geometries are returned:

.. ipython:: python

    res_union = df1.overlay(df2, how='union')
    res_union

    ax = res_union.plot(alpha=0.5, cmap='tab10')
    df1.plot(ax=ax, facecolor='none', edgecolor='k');
    @savefig overlay_example_union.png width=5in
    df2.plot(ax=ax, facecolor='none', edgecolor='k');

The other ``how`` operations will return different subsets of those geometries.
With ``how='intersection'``, it returns only those geometries that are contained
by both GeoDataFrames:

.. ipython:: python

    res_intersection = df1.overlay(df2, how='intersection')
    res_intersection

    ax = res_intersection.plot(cmap='tab10')
    df1.plot(ax=ax, facecolor='none', edgecolor='k');
    @savefig overlay_example_intersection.png width=5in
    df2.plot(ax=ax, facecolor='none', edgecolor='k');

``how='symmetric_difference'`` is the opposite of ``'intersection'`` and returns
the geometries that are only part of one of the GeoDataFrames but not of both:

.. ipython:: python

    res_symdiff = df1.overlay(df2, how='symmetric_difference')
    res_symdiff

    ax = res_symdiff.plot(cmap='tab10')
    df1.plot(ax=ax, facecolor='none', edgecolor='k');
    @savefig overlay_example_symdiff.png width=5in
    df2.plot(ax=ax, facecolor='none', edgecolor='k');

To obtain the geometries that are part of ``df1`` but are not contained in
``df2``, you can use ``how='difference'``:

.. ipython:: python

    res_difference = df1.overlay(df2, how='difference')
    res_difference

    ax = res_difference.plot(cmap='tab10')
    df1.plot(ax=ax, facecolor='none', edgecolor='k');
    @savefig overlay_example_difference.png width=5in
    df2.plot(ax=ax, facecolor='none', edgecolor='k');

Finally, with ``how='identity'``, the result consists of the surface of ``df1``,
but with the geometries obtained from overlaying ``df1`` with ``df2``:

.. ipython:: python

    res_identity = df1.overlay(df2, how='identity')
    res_identity

    ax = res_identity.plot(cmap='tab10')
    df1.plot(ax=ax, facecolor='none', edgecolor='k');
    @savefig overlay_example_identity.png width=5in
    df2.plot(ax=ax, facecolor='none', edgecolor='k');


Overlay groceries example
-------------------------

First, load the Chicago community areas and groceries example datasets and select :

.. ipython:: python

    import geodatasets

    chicago = geopandas.read_file(geodatasets.get_path("geoda.chicago_commpop"))
    groceries = geopandas.read_file(geodatasets.get_path("geoda.groceries"))

    # Project to crs that uses meters as distance measure
    chicago = chicago.to_crs("ESRI:102003")
    groceries = groceries.to_crs("ESRI:102003")

To illustrate the :meth:`~geopandas.GeoDataFrame.overlay` method, consider the following case in which one
wishes to identify the "served" portion of each area -- defined as areas within
1km of a grocery store -- using a ``GeoDataFrame`` of community areas and a
``GeoDataFrame`` of groceries.

.. ipython:: python

    # Look at Chicago:
    @savefig chicago_basic.png width=5in
    chicago.plot();

    # Now buffer groceries to find area within 1km.
    # Check CRS -- USA Contiguous Albers Equal Area, units of meters.
    groceries.crs

    # make 1km buffer
    groceries['geometry']= groceries.buffer(1000)
    @savefig groceries_buffers.png width=5in
    groceries.plot();


To select only the portion of community areas within 1km of a grocery, specify the ``how`` option to be "intersect", which creates a new set of polygons where these two layers overlap:

.. ipython:: python

   chicago_cores = chicago.overlay(groceries, how='intersection')
   @savefig chicago_cores.png width=5in
   chicago_cores.plot(alpha=0.5, edgecolor='k', cmap='tab10');

Changing the ``how`` option allows for different types of overlay operations. For example, if you were interested in the portions of Chicago *far* from groceries (the peripheries), you would compute the difference of the two.

.. ipython:: python

   chicago_peripheries = chicago.overlay(groceries, how='difference')
   @savefig chicago_peripheries.png width=5in
   chicago_peripheries.plot(alpha=0.5, edgecolor='k', cmap='tab10');


.. ipython:: python
    :suppress:

    import matplotlib.pyplot as plt
    plt.close('all')


keep_geom_type keyword
----------------------

In default settings, :meth:`~geopandas.GeoDataFrame.overlay` returns only geometries of the same geometry type as GeoDataFrame
(left one) has, where Polygon and MultiPolygon is considered as a same type (other types likewise).
You can control this behavior using ``keep_geom_type`` option, which is set to
True by default. Once set to False, ``overlay`` will return all geometry types resulting from
selected set-operation. Different types can result for example from intersection of touching geometries,
where two polygons intersects in a line or a point.


More examples
-------------

A larger set of examples of the use of :meth:`~geopandas.GeoDataFrame.overlay` can be found `here <https://nbviewer.jupyter.org/github/geopandas/geopandas/blob/main/doc/source/gallery/overlays.ipynb>`_



.. toctree::
   :maxdepth: 2