File: FORMATS.md

package info (click to toggle)
spatial4j 0.8-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, sid, trixie
  • size: 1,112 kB
  • sloc: java: 10,905; xml: 401; makefile: 2
file content (196 lines) | stat: -rw-r--r-- 7,415 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
# Spatial4j Supported Formats

Spatial4j supports reading and writing Shapes to and from strings in the following formats:

 * Well Known Text (WKT)
 * GeoJSON
 * Polyshape

## Reader/Writer Api

The classes `ShapeReader` and `ShapeWriter` are the interfaces for encoding 
and decoding shapes respectively. They can be obtained from a `SpatialContext` instance.

    SpatialContext ctx = ...;
    ShapeReader shpReader = ctx.getFormats().getReader(ShapeIO.WKT);
    ShapeWriter shpWriter = ctx.getFormats().getWriter(ShapeIO.WKT);

Reading and writing polygons requires a reader/writer obtained from `JtsSpatialContext`.

## Well Known Text

Well-Known-Text (WKT) is a simple to read text based format for representing spatial objects. It is
defined by the [OGC Simple Feature Specification](http://www.opengeospatial.org/standards/sfa).  [Wikipedia's page](https://en.wikipedia.org/wiki/Well-known_text) on it is pretty decent.

The following table shows the various shape types encoded as WKT:

| Shape           | WKT                                                   |
| ----------------|-------------------------------------------------------|
| Point           | `POINT(1 2)`                                          |
| Rectangle       | `ENVELOPE(1, 2, 4, 3)`      _(minX, maxX, maxY, minY)_|
| Circle          | `BUFFER(POINT(-10 30), 5.2)`                          |
| LineString      | `LINESTRING(1 2, 3 4)`                                |
| Buffered L/S    | `BUFFER(LINESTRING(1 2, 3 4), 0.5)`                   |
| Polygon         | `POLYGON ((1 1, 2 1, 2 2, 1 2, 1 1))`                 |
| ShapeCollection | `GEOMETRYCOLLECTION(POINT(1 2),LINESTRING(1 2, 3 4))` |

The `ENVELOPE` keyword/syntax was borrowed from OGC's [Common Query Language (CQL)](http://docs.geoserver.org/stable/en/user/tutorials/cql/cql_tutorial.html) standard, a superset of WKT.  Note the odd argument order.  It's not widely used.  Alternatively a rectangular polygon can be used, which will be recognized as-such when
it is read and turned into a Rectangle instance.

The `BUFFER` keyword is a Spatial4j extension to the WKT spec.  It's used to produce a circle by buffering a point, and to buffer shapes generally.  The 2nd argument is the buffer distance in degrees.

## GeoJSON

GeoJSON is a format for representing geographic objects using JavaScript Object Notation. It is 
defined by the open standard available at http://geojson.org.

The following list shows the various shape types encoded as GeoJSON:

* Point

        {
          "type": "Point",
          "coordinates": [1, 2]
        }

* Rectangle

        {
          "type": "Polygon",
          "coordinates": [
            [[1,3], [1,4], [2,4], [2,3], [1,3]]
          ]
        }

* Circle

        {
          "type": "Circle", 
          "coordinates": [1, 2], 
          "radius": 111.19508, 
          "properties": {
            "radius_units": "km"
          }
        }

* LineString

        {
          "type": "LineString",
          "coordinates": [[1, 2], [3, 4]]
        }

* Buffered LineString

        {
          "type": "LineString",
          "coordinates": [[1, 2], [3, 4]],
          "buffer": 10
        }

* Polygon

        {
          "type": "Polygon", 
          "coordinates": [
            [[1, 1], [2, 1], [2, 2], [1, 2], [1, 1]]
          ]
        }

* ShapeCollection

        {
          "type": "GeometryCollection", 
          "geometries": [
            {
              "type": "Point", 
              "coordinates": [1, 2]
            }, 
            {
              "type": "LineString", 
              "coordinates": [
                [1, 2],
                [3, 4]
              ]
            }
          ]
        }

## Polyshape

The Polyshape format is an extension to the [Polyline](https://developers.google.com/maps/documentation/utilities/polylinealgorithm) format from Google to include a wider diversity of 
shapes. The standard "Encoded Polyline Algorithm Format" algorithm offers a compact way to encode a
list of lat/lon pairs into a simple ASCII string.

The following table shows the various shape types encoded as Polyshape:

| Shape           | Polyshape                                             |
| ----------------|-------------------------------------------------------|
| Point           | `0_ibE_seK`                                           |
| Rectangle       | `5_ibE_}hQ_ibE_ibE`                                   |
| Circle          | `4(_ibE)_ibE_seK`                                     |
| LineString      | `1_ibE_seK_seK_seK`                                   |
| Buffered L/S    | ``1(_c`|@)_ibE_seK_seK_seK``                          |
| Polygon         | `2_ibE_ibE_ibE??_ibE~hbE??~hbE`                       |
| ShapeCollection | `0_ibE_seK 1_ibE_seK_seK_seK`                         |


### Prefix Key

The Polyshape format uses a prefix key to denote the shape type:

| Shape      | Key |
|------------|:---:|
| Point      | 0   |
| LineString | 1   |
| Polygon    | 2   |
| MultiPoint | 3   |
| Circle     | 4   |
| Rectangle  | 5   |


### Collections

Shape collections are represented as each of the individual shape encodings concatenated with 
a space.


### Arguments

The Polyshape format supports optional arguments to handle things like a Circle (with a radius) and 
a BufferedLineString (with a buffer size). If the character immediately after the prefix is a `'('`, 
everything up to the next `')'` is considered an argument to the Shape.


### Polygons with Holes

A Polygon with interior rings is represented by appending the encoding of each ring prefixed with 
a `')'`:

    '2' + encode(exteriorRing) + ['(' + encode(interiorRing)]*

### Known Limitations

- The format is optimized to store lat/lon points, very big or very small values may get lost in the rounding
- All values are rounded to: Math.round(value * 1e5)
- In the JTS version, a homogeneous ShapeCollection will be read as a MultPoint, MultiLineString, or MultiPolygon

## Benchmarks

The following table shows a comparison among the encoded formats in terms of number of bytes in the
final encoding. Note that this comparison may be somewhat misleading since the other encoding 
formats do not implicitly limit precision as Polyshape does. Percentages are calculated relative to
the largest encoding.

| GeoJSON    | WKT      | Binary    | Polyshape  | Shape |
|:----------:|:--------:|:---------:|:----------:|-------|
| 100% (42)  | 38% (16) |  40% (17) |   20% (9)  | `POINT(100.1 0.1)` |
| 100% (61)  | 52% (33) |  67% (41) |   28% (17) | `LINESTRING (100.1 0.1, 101.1 1.1)` |
| 100% (96)  | 33% (32) |  34% (33) |   18% (17) | `ENVELOPE(100.1, 101.1, 1.1, 0.1)` |
| 90% (158)  | 69% (122)| %100 (177)|   33% (59) | `POLYGON ((100.1 0.1, 101.1 0.1, 101.1 1.1, 100.1 1.1, 100.1 0.1), (100.2 0.2, 100.8 0.2, 100.8 0.8, 100.2 0.8, 100.2 0.2))` |
| 100% (169) | 51% (87) |  %52 (88) |   21% (36) | `GEOMETRYCOLLECTION(LINESTRING (100.1 0.1, 101.1 1.1),LINESTRING (102.1 2.1, 103.1 3.1))` |
| 100% (131) | 40% (53) |  %31 (40) |   15% (20) | `GEOMETRYCOLLECTION(POINT(100.1 0.1),POINT(101.1 1.1))` |

With this limited dataset it looks like the Polyshape format is on average ~25% as big as GeoJSON, 
~50% as big as WKT, and ~40% as big as the binary encoding. THe data used for this benchmark 
contains data with low precision numbers (100.1 vs 100.123456) so that may be a low estimate.