File: how_to.rst

package info (click to toggle)
python-geopandas 1.1.1-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 14,848 kB
  • sloc: python: 26,022; makefile: 147; sh: 25
file content (30 lines) | stat: -rw-r--r-- 1,398 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
.. _how_to:

How to...
=========

Drop duplicate geometry in all situations
-----------------------------------------

Using the standard Pandas :meth:`~pandas.DataFrame.drop_duplicates` function on a geometry column can lead to some duplicate
geometries not being dropped, in certain circumstances. When used on a geometry columnm, the Pandas function compares the
WKB of each geometry object. This is sensitive to the orders of various components of the geometry - for example, a line
with co-ordinates in the order left-to-right should be equal to a line with the same co-ordinates in the order right-to-left,
but the WKB representations will be different. The same applies for the order of rings of polygons and parts in multipart
geometries.

To deal with this problem, use the :meth:`~geopandas.GeoSeries.normalize` method first to order the co-ordinates in a canonincal form,
and then use the standard :meth:`~pandas.DataFrame.drop_duplicates` method::

    gdf["geometry"] = gdf.normalize()
    gdf.drop_duplicates()

The effect of the :meth:`~geopandas.GeoSeries.normalize` method can be seen in the following example::

    >>> geopandas.GeoSeries([
    ...     shapely.LineString([(0, 0), (1, 0), (2, 0)]),
    ...     shapely.LineString([(2, 0), (1, 0), (0, 0)]),
    ... ]).normalize().to_wkt()
    0    LINESTRING (0 0, 1 0, 2 0)
    1    LINESTRING (0 0, 1 0, 2 0)
    dtype: object