File: reference_cluster.xml

package info (click to toggle)
postgis 3.3.2%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 87,748 kB
  • sloc: ansic: 158,671; sql: 91,546; xml: 54,004; cpp: 12,339; sh: 5,187; perl: 5,100; makefile: 3,085; python: 1,205; yacc: 447; lex: 151; javascript: 6
file content (402 lines) | stat: -rw-r--r-- 14,797 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
<?xml version="1.0" encoding="UTF-8"?>
  <sect1 id="Clustering_Functions">
    <sect1info>
    <abstract>
    <para>These functions implement clustering algorithms for sets of geometries.</para>
    </abstract>
    </sect1info>
	<title>Clustering Functions</title>

    <refentry id="ST_ClusterDBSCAN">
	  <refnamediv>
		<refname>ST_ClusterDBSCAN</refname>

        <refpurpose>Window function that returns a cluster id for each input geometry using the DBSCAN algorithm.</refpurpose>
    </refnamediv>

	  <refsynopsisdiv>
		<funcsynopsis>
		  <funcprototype>
			<funcdef>integer <function>ST_ClusterDBSCAN</function></funcdef>

			<paramdef><type>geometry winset </type>
			<parameter>geom</parameter></paramdef>

			<paramdef><type>float8 </type>
			<parameter>eps</parameter></paramdef>

			<paramdef><type>integer </type>
			<parameter>minpoints</parameter></paramdef>
		  </funcprototype>
		</funcsynopsis>
	  </refsynopsisdiv>

	  <refsection>
      <title>Description</title>

	  <para>
		  Returns cluster number for each input geometry, based on a 2D implementation of the
          <ulink url="https://en.wikipedia.org/wiki/DBSCAN">Density-based spatial clustering of applications with noise (DBSCAN)</ulink>
		  algorithm.  Unlike <xref linkend="ST_ClusterKMeans" />, it does not require the number of clusters to be specified, but instead
		  uses the desired <link linkend="ST_Distance">distance</link> (<varname>eps</varname>) and density (<varname>minpoints</varname>) parameters to construct each cluster.
	  </para>

	  <para>
		  An input geometry will be added to a cluster if it is either:
		  <itemizedlist>
              <listitem>
                  <para>
                      A "core" geometry, that is within <varname>eps</varname> <link linkend="ST_Distance">distance</link> of at least <varname>minpoints</varname> input geometries (including itself) or
                  </para>
			  </listitem>
			  <listitem>
                  <para>
                      A "border" geometry, that is within <varname>eps</varname> <link linkend="ST_Distance">distance</link> of a core geometry.
                  </para>
			  </listitem>
		  </itemizedlist>
		</para>

		<para>
		  Note that border geometries may be within <varname>eps</varname> distance of core geometries in more than one cluster; in this
		  case, either assignment would be correct, and the border geometry will be arbitrarily asssigned to one of the available clusters.
		  In these cases, it is possible for a correct cluster to be generated with fewer than <varname>minpoints</varname> geometries.
		  When assignment of a border geometry is ambiguous, repeated calls to ST_ClusterDBSCAN will produce identical results if an ORDER BY
		  clause is included in the window definition, but cluster assignments may differ from other implementations of the same algorithm.
	  </para>

	  <note><para>
		  Input geometries that do not meet the criteria to join any other cluster will be assigned a cluster number of NULL.
	  </para></note>

      <para>Availability: 2.3.0</para>
    </refsection>

    <refsection>
      <title>Examples</title>
      <para>
          Assigning a cluster number to each polygon within 50 meters of each other.  Require at least 2 polygons per cluster
      </para>
	<informaltable>
				  <tgroup cols="2">
					<tbody>
				  <row>
						<entry><para><informalfigure>
							<mediaobject>
							  <imageobject>
								<imagedata fileref="images/st_clusterdbscan01.png" />
							  </imageobject>
							  <caption><para>within 50 meters at least 2 per cluster. singletons have NULL for cid</para></caption>
							</mediaobject>
						  </informalfigure>
  <programlisting>SELECT name, ST_ClusterDBSCAN(geom, eps := 50, minpoints := 2) over () AS cid
FROM boston_polys
WHERE name > '' AND building > ''
	AND ST_DWithin(geom,
        ST_Transform(
            ST_GeomFromText('POINT(-71.04054 42.35141)', 4326), 26986),
           500);</programlisting>
						  </para></entry>

						<entry><para><screen><![CDATA[                name                 | bucket
-------------------------------------+--------
 Manulife Tower                      |      0
 Park Lane Seaport I                 |      0
 Park Lane Seaport II                |      0
 Renaissance Boston Waterfront Hotel |      0
 Seaport Boston Hotel                |      0
 Seaport Hotel & World Trade Center  |      0
 Waterside Place                     |      0
 World Trade Center East             |      0
 100 Northern Avenue                 |      1
 100 Pier 4                          |      1
 The Institute of Contemporary Art   |      1
 101 Seaport                         |      2
 District Hall                       |      2
 One Marina Park Drive               |      2
 Twenty Two Liberty                  |      2
 Vertex                              |      2
 Vertex                              |      2
 Watermark Seaport                   |      2
 Blue Hills Bank Pavilion            |   NULL
 World Trade Center West             |   NULL
(20 rows)]]></screen></para>
				</entry>
					  </row>
				</tbody>
				</tgroup>
			</informaltable>


        <para>
            Combining parcels with the same cluster number into a single geometry. This uses named argument calling
        </para>
		    <programlisting>
SELECT cid, ST_Collect(geom) AS cluster_geom, array_agg(parcel_id) AS ids_in_cluster FROM (
    SELECT parcel_id, ST_ClusterDBSCAN(geom, eps := 0.5, minpoints := 5) over () AS cid, geom
    FROM parcels) sq
GROUP BY cid;
    </programlisting>
    </refsection>

    <refsection>
		  <title>See Also</title>
          <para><xref linkend="ST_DWithin"/>,
              <xref linkend="ST_ClusterKMeans"/>,
              <xref linkend="ST_ClusterIntersecting"/>,
              <xref linkend="ST_ClusterWithin"/>
          </para>
	  </refsection>

    </refentry>

    <refentry id="ST_ClusterIntersecting">
      <refnamediv>
        <refname>ST_ClusterIntersecting</refname>

        <refpurpose>Aggregate function that clusters the input geometries into connected sets.</refpurpose>
      </refnamediv>

      <refsynopsisdiv>
        <funcsynopsis>
          <funcprototype>
            <funcdef>geometry[] <function>ST_ClusterIntersecting</function></funcdef>
            <paramdef><type>geometry set</type> <parameter>g</parameter></paramdef>
          </funcprototype>
        </funcsynopsis>
      </refsynopsisdiv>

      <refsection>
        <title>Description</title>

        <para>ST_ClusterIntersecting is an aggregate function that returns an array of GeometryCollections, where each GeometryCollection represents an interconnected set of geometries.</para>

        <para>Availability: 2.2.0</para>
      </refsection>

      <refsection>
        <title>Examples</title>
        <programlisting>
WITH testdata AS
  (SELECT unnest(ARRAY['LINESTRING (0 0, 1 1)'::geometry,
		       'LINESTRING (5 5, 4 4)'::geometry,
		       'LINESTRING (6 6, 7 7)'::geometry,
		       'LINESTRING (0 0, -1 -1)'::geometry,
		       'POLYGON ((0 0, 4 0, 4 4, 0 4, 0 0))'::geometry]) AS geom)

SELECT ST_AsText(unnest(ST_ClusterIntersecting(geom))) FROM testdata;

--result

st_astext
---------
GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),POLYGON((0 0,4 0,4 4,0 4,0 0)))
GEOMETRYCOLLECTION(LINESTRING(6 6,7 7))
        </programlisting>
      </refsection>
      <refsection>
        <title>See Also</title>
        <para>
            <xref linkend="ST_ClusterDBSCAN" />,
            <xref linkend="ST_ClusterKMeans" />,
            <xref linkend="ST_ClusterWithin" />
        </para>
      </refsection>

    </refentry>


	<refentry id="ST_ClusterKMeans">
	  <refnamediv>
		<refname>ST_ClusterKMeans</refname>

		<refpurpose>Window function that returns a cluster id for each input geometry using the K-means algorithm.</refpurpose>
	  </refnamediv>

	  <refsynopsisdiv>
		<funcsynopsis>
		  <funcprototype>
			<funcdef>integer <function>ST_ClusterKMeans</function></funcdef>

			<paramdef><type>geometry winset </type>
			<parameter>geom</parameter></paramdef>

			<paramdef><type>integer </type>
			<parameter>number_of_clusters</parameter></paramdef>

            <paramdef><type>float </type>
			<parameter>max_radius</parameter></paramdef>
		  </funcprototype>
		</funcsynopsis>
	  </refsynopsisdiv>

	  <refsection>
      <title>Description</title>

      <para>Returns <ulink url="https://en.wikipedia.org/wiki/K-means_clustering">K-means</ulink>
        cluster number for each input geometry. The distance used for clustering is the
        distance between the centroids for 2D geometries, and distance between bounding box centers for 3D geometries.
        For POINT inputs, M coordinate will be treated as weight of input and has to be larger than 0.
      </para>
      <para><varname>max_radius</varname>, if set, will cause ST_ClusterKMeans to generate more clusters than
        <varname>k</varname> ensuring that no cluster in output has radius larger than <varname>max_radius</varname>.
        This is useful in reachability analysis. </para>
      <para>Enhanced: 3.2.0 Support for <varname>max_radius</varname></para>
      <para>Enhanced: 3.1.0 Support for 3D geometries and weights</para>
      <para>Availability: 2.3.0</para>
    </refsection>

    <refsection>
      <title>Examples</title>
		<para>Generate dummy set of parcels for examples:</para>
		<programlisting>CREATE TABLE parcels AS
SELECT lpad((row_number() over())::text,3,'0') As parcel_id, geom,
('{residential, commercial}'::text[])[1 + mod(row_number()OVER(),2)] As type
FROM
    ST_Subdivide(ST_Buffer('SRID=3857;LINESTRING(40 100, 98 100, 100 150, 60 90)'::geometry,
    40, 'endcap=square'),12) As geom;
</programlisting>

        <para><informalfigure>
            <mediaobject>
                <imageobject>
                <imagedata fileref="images/st_clusterkmeans02.png" />
                </imageobject>
                <caption><para>Parcels color-coded by cluster number (cid)</para></caption>
            </mediaobject>
            </informalfigure>
<programlisting>
SELECT ST_ClusterKMeans(geom, 3) OVER() AS cid, parcel_id, geom
    FROM parcels;</programlisting>
<screen> cid | parcel_id |   geom
-----+-----------+---------------
   0 | 001       | 0103000000...
   0 | 002       | 0103000000...
   1 | 003       | 0103000000...
   0 | 004       | 0103000000...
   1 | 005       | 0103000000...
   2 | 006       | 0103000000...
   2 | 007       | 0103000000...
</screen>
        </para>

        <para>Partitioning parcel clusters by type:</para>
<programlisting>
SELECT ST_ClusterKMeans(geom, 3) over (PARTITION BY type) AS cid, parcel_id, type
    FROM parcels;</programlisting>
<screen> cid | parcel_id |    type
-----+-----------+-------------
   1 | 005       | commercial
   1 | 003       | commercial
   2 | 007       | commercial
   0 | 001       | commercial
   1 | 004       | residential
   0 | 002       | residential
   2 | 006       | residential
</screen>

<para>Example: Clustering a preaggregated planetary-scale data population dataset
using 3D clusering and weighting.
Identify at least 20 regions based on
<ulink url="https://data.humdata.org/dataset/kontur-population-dataset">Kontur Population Data</ulink>
that do not span more than 3000 km from their center:</para>
<programlisting>create table kontur_population_3000km_clusters as
select
    geom,
    ST_ClusterKMeans(
        ST_Force4D(
            ST_Transform(ST_Force3D(geom), 4978), -- cluster in 3D XYZ CRS
            mvalue := population -- set clustering to be weighed by population
        ),
        20,                      -- aim to generate at least 20 clusters
        max_radius := 3000000    -- but generate more to make each under 3000 km radius
    ) over () as cid
from
    kontur_population;
    </programlisting>
    <para><informalfigure>
    <mediaobject>
        <imageobject>
        <imagedata fileref="images/st_clusterkmeans03.png" />
        </imageobject>
        <caption><para>World population clustered to above specs produces 46 clusters.
        Clusters are centered at well-populated regions (New York, Moscow).
        Greenland is one cluster.
        There are island clusters that span across the antimeridian.
        Cluster edges follow Earth's curvature.</para></caption>
    </mediaobject>
    </informalfigure>
    </para>

    </refsection>

    <refsection>
		  <title>See Also</title>
          <para>
              <xref linkend="ST_ClusterDBSCAN"/>,
              <xref linkend="ST_ClusterIntersecting" />,
              <xref linkend="ST_ClusterWithin" />,
              <xref linkend="ST_Subdivide" />,
              <xref linkend="ST_Force_3D" />,
              <xref linkend="ST_Force_4D" />,
          </para>
	  </refsection>
	</refentry>

	<refentry id="ST_ClusterWithin">
      <refnamediv>
        <refname>ST_ClusterWithin</refname>

        <refpurpose>Aggregate function that clusters the input geometries by separation distance.</refpurpose>
      </refnamediv>

      <refsynopsisdiv>
        <funcsynopsis>
          <funcprototype>
            <funcdef>geometry[] <function>ST_ClusterWithin</function></funcdef>
            <paramdef><type>geometry set </type> <parameter>g</parameter></paramdef>
            <paramdef><type>float8 </type> <parameter>distance</parameter></paramdef>
          </funcprototype>
        </funcsynopsis>
      </refsynopsisdiv>

      <refsection>
        <title>Description</title>

        <para>ST_ClusterWithin is an aggregate function that returns an array of GeometryCollections, where each GeometryCollection represents a set of geometries separated by no more than the specified distance.  (Distances are Cartesian distances in the units of the SRID.)</para>

        <para>Availability: 2.2.0</para>
      </refsection>

      <refsection>
        <title>Examples</title>
        <programlisting>
WITH testdata AS
  (SELECT unnest(ARRAY['LINESTRING (0 0, 1 1)'::geometry,
		       'LINESTRING (5 5, 4 4)'::geometry,
		       'LINESTRING (6 6, 7 7)'::geometry,
		       'LINESTRING (0 0, -1 -1)'::geometry,
		       'POLYGON ((0 0, 4 0, 4 4, 0 4, 0 0))'::geometry]) AS geom)

SELECT ST_AsText(unnest(ST_ClusterWithin(geom, 1.4))) FROM testdata;

--result

st_astext
---------
GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),POLYGON((0 0,4 0,4 4,0 4,0 0)))
GEOMETRYCOLLECTION(LINESTRING(6 6,7 7))
        </programlisting>
      </refsection>
      <refsection>
        <title>See Also</title>
        <para>
          <xref linkend="ST_ClusterDBSCAN" />,
          <xref linkend="ST_ClusterKMeans" />,
          <xref linkend="ST_ClusterIntersecting" />
        </para>
      </refsection>

    </refentry>

</sect1>