File: data_structures.rst

package info (click to toggle)
python-geopandas 1.1.1-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 14,848 kB
  • sloc: python: 26,022; makefile: 147; sh: 25
file content (186 lines) | stat: -rw-r--r-- 7,923 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
.. currentmodule:: geopandas

.. ipython:: python
   :suppress:

   import geopandas
   import matplotlib
   orig = matplotlib.rcParams['figure.figsize']
   matplotlib.rcParams['figure.figsize'] = [orig[0] * 1.5, orig[1] * 1.5]


Data structures
=========================================

GeoPandas implements two main data structures, a :class:`GeoSeries` and a
:class:`GeoDataFrame`.  These are subclasses of :class:`pandas.Series` and
:class:`pandas.DataFrame`, respectively.

GeoSeries
---------

A :class:`GeoSeries` is essentially a vector where each entry in the vector
is a set of shapes corresponding to one observation. An entry may consist
of only one shape (like a single polygon) or multiple shapes that are
meant to be thought of as one observation (like the many polygons that
make up the State of Hawaii or a country like Indonesia).

GeoPandas has three basic classes of geometric objects (which are actually
`Shapely <https://shapely.readthedocs.io/en/stable/manual.html>`__ objects):

* Points / Multi-Points
* Lines / Multi-Lines
* Polygons / Multi-Polygons

Note that all entries in a :class:`GeoSeries` do not need to be of the same geometric type, although certain export operations will fail if this is not the case.

Overview of attributes and methods
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The :class:`GeoSeries` class implements nearly all of the attributes and
methods of Shapely objects.  When applied to a :class:`GeoSeries`, they
will apply elementwise to all geometries in the series.  Binary
operations can be applied between two :class:`GeoSeries`, in which case the
operation is carried out elementwise.  The two series will be aligned
by matching indices.  Binary operations can also be applied to a
single geometry, in which case the operation is carried out for each
element of the series with that geometry.  In either case, a
:class:`~pandas.Series` or a :class:`GeoSeries` will be returned, as appropriate.

A short summary of a few attributes and methods for GeoSeries is
presented here, and a full list can be found in the :doc:`GeoSeries API reference <../reference/geoseries>`.
There is also a family of methods for creating new shapes by expanding
existing shapes or applying set-theoretic operations like "union" described
in :doc:`Geometric manipulations <geometric_manipulations>`.

Attributes
^^^^^^^^^^^^^^^
* :attr:`~GeoSeries.area`: shape area (units of projection -- see :doc:`projections <projections>`)
* :attr:`~GeoSeries.bounds`: tuple of max and min coordinates on each axis for each shape
* :attr:`~GeoSeries.total_bounds`: tuple of max and min coordinates on each axis for entire GeoSeries
* :attr:`~GeoSeries.geom_type`: type of geometry.
* :attr:`~GeoSeries.is_valid`: tests if coordinates make a shape that is reasonable geometric shape according to the `Simple Feature Access <http://www.opengeospatial.org/standards/sfa>`_ standard.

Basic methods
^^^^^^^^^^^^^^

* :meth:`~GeoSeries.distance`: returns :class:`~pandas.Series` with minimum distance from each entry to ``other``
* :attr:`~GeoSeries.centroid`: returns :class:`GeoSeries` of centroids
* :meth:`~GeoSeries.representative_point`:  returns :class:`GeoSeries` of points that are guaranteed to be within each geometry. It does **NOT** return centroids.
* :meth:`~GeoSeries.to_crs`: change coordinate reference system. See :doc:`projections <projections>`
* :meth:`~GeoSeries.plot`: plot :class:`GeoSeries`. See :doc:`mapping <mapping>`.

Relationship tests
^^^^^^^^^^^^^^^^^^^

* :meth:`~GeoSeries.geom_equals_exact`: is shape the same as ``other`` (up to a specified decimal place tolerance)
* :meth:`~GeoSeries.contains`: is shape contained within ``other``
* :meth:`~GeoSeries.intersects`: does shape intersect ``other``


GeoDataFrame
------------

A :class:`GeoDataFrame` is a tabular data structure that contains a :class:`GeoSeries`.

The most important property of a :class:`GeoDataFrame` is that it always has one :class:`GeoSeries` column that
holds a special status - the "active geometry column". When a spatial method is applied to a
:class:`GeoDataFrame` (or a spatial attribute like ``area`` is called), these operations will always act on the
active geometry column.

The active geometry column -- no matter the name of the corresponding :class:`GeoSeries` --
can be accessed through the :attr:`~GeoDataFrame.geometry` attribute (``gdf.geometry``),
and the name of the ``geometry`` column can be found by typing ``gdf.geometry.name`` or ``gdf.active_geometry_name``.

A :class:`GeoDataFrame` may also contain other columns with geometrical (shapely) objects, but only one column can be the active geometry at a time. To change which column is the active geometry column, use the :meth:`GeoDataFrame.set_geometry` method.

An example using the ``geoda.malaria`` dataset from ``geodatasets`` containing the counties of Colombia:

.. ipython:: python

    import geodatasets

    colombia = geopandas.read_file(geodatasets.get_path('geoda.malaria'))

    colombia.head()
    # Plot countries
    @savefig colombia_borders.png
    colombia.plot(markersize=.5);

Currently, the column named "geometry" with county borders is the active
geometry column:

.. ipython:: python

    colombia.geometry.name

You can also rename this column to "borders":

.. ipython:: python

    colombia = colombia.rename_geometry('borders')
    colombia.geometry.name

Now, you create centroids and make it the geometry:

.. ipython:: python
   :okwarning:

    colombia['centroid_column'] = colombia.centroid
    colombia = colombia.set_geometry('centroid_column')

    @savefig colombia_centroids.png
    colombia.plot();


**Note:** A :class:`GeoDataFrame` keeps track of the active column by name, so if you rename the active geometry column, you must also reset the geometry::

    gdf = gdf.rename(columns={'old_name': 'new_name'}).set_geometry('new_name')

**Note 2:** Somewhat confusingly, by default when you use the :func:`~geopandas.read_file` command, the column containing spatial objects from the file is named "geometry" by default, and will be set as the active geometry column. However, despite using the same term for the name of the column and the name of the special attribute that keeps track of the active column, they are distinct. You can easily shift the active geometry column to a different :class:`GeoSeries` with the :meth:`~GeoDataFrame.set_geometry` command. Further, ``gdf.geometry`` will always return the active geometry column, *not* the column named ``geometry``. If you wish to call a column named "geometry", and a different column is the active geometry column, use ``gdf['geometry']``, not ``gdf.geometry``.

Attributes and methods
~~~~~~~~~~~~~~~~~~~~~~

Any of the attributes calls or methods described for a :class:`GeoSeries` will work on a :class:`GeoDataFrame` -- they are just applied to the active geometry column :class:`GeoSeries`.

However, :class:`GeoDataFrames <GeoDataFrame>` also have a number few extra methods for:

* :doc:`Reading and writing files <io>`
* :ref:`Spatial joins <mergingdata.spatial-joins>`
* :doc:`Spatial aggregations <aggregation_with_dissolve>`
* :doc:`Geocoding <geocoding>`



.. ipython:: python
    :suppress:

    matplotlib.rcParams['figure.figsize'] = orig


Display options
---------------

GeoPandas has an ``options`` attribute with global configuration attributes:

.. ipython:: python

    import geopandas
    geopandas.options

The ``geopandas.options.display_precision`` option can control the number of
decimals to show in the display of coordinates in the geometry column.
In the ``colombia`` example of above, the default is to show 5 decimals for
geographic coordinates:

.. ipython:: python

    colombia['centroid_column'].head()

If you want to change this, for example to see more decimals, you can do:

.. ipython:: python

    geopandas.options.display_precision = 9
    colombia['centroid_column'].head()