1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
|
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
# Geospatial Test Files
These test files cover the main and corner case functionality of the
[Parquet Geospatial Types](https://github.com/apache/parquet-format/blob/master/Geospatial.md)
GEOMETRY and GEOGRAPHY.
- `geospatial.parquet`: Contains row groups with specific combinations of
geometry types to test statistics generation and geometry type coverage.
The file contains columns `group` (string identifier of the group name),
`wkt` (the human-readable well-known text representation of the geometry)
and `geometry` (a Parquet GEOMETRY column). A human-readable version of
the file is available in `geospatial.yaml`.
- `geospatial-with-nan.parquet`: Contains a single row group with a GEOMETRY
column whose contents contains two valid geometries and one invalid LINESTRING
whose coordinates contain a `NaN` value in all dimensions. Such a geometry is
not valid and the behaviour of it is not defined; however, implementations should
not generate statistics that would prevent the other (valid) geometries in the
column chunk from appearing in the case of predicate pushdown. Notably,
implementations should *not* generate statistics that contain `NaN` for this case.
Note that POINT EMPTY is represented by convention in well-known binary as
a POINT whose coordinates are all `NaN`, which should be treated as a valid
(but empty) geometry.
- `crs-default.parquet`: Contains a GEOMETRY column with the crs
omitted. This should be interpreted as OGC:CRS84 (i.e., longitude/latitude).
- `crs-geography.parquet`: Contains a GEOGRAPHY column with the crs
omitted. This should be interpreted as OGC:CRS84 (i.e., longitude/latitude).
- `crs-projjson.parquet`: Contains a GEOMETRY column with the crs parameter
set to `projjson:projjson_epsg_5070` and a metadata field with the key
`projjson_epsg_5070` and a value consisting of the appropriate PROJJSON
value for EPSG:5070.
- `crs-srid.parquet`: Contains a GEOMETRY column with the crs parameter set
to `srid:5070`. The Parquet format does not mention the EPSG database in
any way, but otherwise out-of-context SRID values are commonly interpreted
as the corresponding EPSG:xxxx value. Producers of SRIDs may wish to
avoid valid EPSG:xxxx values where this is not the intended usage to minimize
the chances they will be misinterpreted by consumers who make this assumption.
- `crs-arbitrary-value.parquet`: Contains a GEOMETRY column with the crs
parameter set to an arbitrary string value. The Parquet format does not
restrict the value of the crs parameter and implementations may choose to
attempt interpreting the value or error.
|