File: README.md

package info (click to toggle)
apache-parquet-testing 0.0%2Bgit20260304-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,616 kB
  • sloc: python: 340; makefile: 7
file content (67 lines) | stat: -rw-r--r-- 3,475 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

# Geospatial Test Files


These test files cover the main and corner case functionality of the
[Parquet Geospatial Types](https://github.com/apache/parquet-format/blob/master/Geospatial.md)
GEOMETRY and GEOGRAPHY.

- `geospatial.parquet`: Contains row groups with specific combinations of
  geometry types to test statistics generation and geometry type coverage.
  The file contains columns `group` (string identifier of the group name),
  `wkt` (the human-readable well-known text representation of the geometry)
  and `geometry` (a Parquet GEOMETRY column). A human-readable version of
  the file is available in `geospatial.yaml`.

- `geospatial-with-nan.parquet`: Contains a single row group with a GEOMETRY
  column whose contents contains two valid geometries and one invalid LINESTRING
  whose coordinates contain a `NaN` value in all dimensions. Such a geometry is
  not valid and the behaviour of it is not defined; however, implementations should
  not generate statistics that would prevent the other (valid) geometries in the
  column chunk from appearing in the case of predicate pushdown. Notably,
  implementations should *not* generate statistics that contain `NaN` for this case.

  Note that POINT EMPTY is represented by convention in well-known binary as
  a POINT whose coordinates are all `NaN`, which should be treated as a valid
  (but empty) geometry.

- `crs-default.parquet`: Contains a GEOMETRY column with the crs
  omitted. This should be interpreted as OGC:CRS84 (i.e., longitude/latitude).

- `crs-geography.parquet`: Contains a GEOGRAPHY column with the crs
  omitted. This should be interpreted as OGC:CRS84 (i.e., longitude/latitude).

- `crs-projjson.parquet`: Contains a GEOMETRY column with the crs parameter
  set to `projjson:projjson_epsg_5070` and a metadata field with the key
  `projjson_epsg_5070` and a value consisting of the appropriate PROJJSON
  value for EPSG:5070.

- `crs-srid.parquet`: Contains a GEOMETRY column with the crs parameter set
  to `srid:5070`. The Parquet format does not mention the EPSG database in
  any way, but otherwise out-of-context SRID values are commonly interpreted
  as the corresponding EPSG:xxxx value. Producers of SRIDs may wish to
  avoid valid EPSG:xxxx values where this is not the intended usage to minimize
  the chances they will be misinterpreted by consumers who make this assumption.

- `crs-arbitrary-value.parquet`: Contains a GEOMETRY column with the crs
  parameter set to an arbitrary string value. The Parquet format does not
  restrict the value of the crs parameter and implementations may choose to
  attempt interpreting the value or error.