1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313
|
.. _tree-in-depth:
The tree in-depth
=================
The ASDF tree, being encoded in YAML, is built out of the basic
structures common to most dynamic languages: mappings (dictionaries),
sequences (lists), and scalars (strings, integers, floating-point
numbers, booleans, etc.). All of this comes "for free" by using `YAML
<http://yaml.org/spec/1.1/>`__.
Since these core data structures on their own are so flexible, the
ASDF standard includes a number of schema that define the structure of
higher-level content. For instance, there is a schema that defines
how :ref:`n-dimensional array data <core/ndarray-1.0.0>` should be
described. These schema are written in a language called
:ref:`yaml-schema` which is just a thin extension of `JSON Schema,
Draft 4
<http://json-schema.org/latest/json-schema-validation.html>`__. (Such
extensions are allowed and even encouraged by the JSON Schema
standard, which defines the ``$schema`` attribute as a place to
specify which extension is being used.) `asdf-schemas` contains an overview of
how schemas are defined and used by ASDF. :ref:`schema` describes in detail
all of the schemas provided by the ASDF Standard. reference to all of schemas
in detail.
.. _yaml_subset:
YAML subset
-----------
For reasons of portability, some features of YAML 1.1 are not
permitted in an ASDF tree.
Restricted mapping keys
^^^^^^^^^^^^^^^^^^^^^^^
YAML itself places no restrictions on the object type used as a mapping key;
floats, sequences, even mappings themselves can serve as a key. For example,
the following is a perfectly valid YAML document::
%YAML 1.1
---
{foo: bar}:
3.14159: baz
[1, 2, 3]: qux
...
However, such a file may not be easily parsed in all languages. Python,
for example, does not include a hashable mapping type, so the two major
Python YAML libraries both fail to construct the object described by this
document. Floating-point keys are described as "not recommended" in the
YAML 1.1 spec because YAML does not specify an accuracy for floats.
For these reasons, mapping keys in ASDF trees are restricted to
the following scalar types:
- bool
- int
- str
.. _tags:
Tags
----
YAML includes the ability to assign :ref:`tags` (or types) to any
object in the tree. This is an important feature that sets it apart
from other data representation languages, such as JSON. ASDF defines
a number of custom tags, each of which has a corresponding schema.
For example the tag of the root element of the tree must always be
``tag:stsci.edu:asdf/core/asdf-1.1.0``, which corresponds to the
:ref:`asdf schema <core/asdf-1.1.0>` --in other words, the top level schema for
ASDF trees. A validating ASDF reader would encounter the tag when reading in
the file, load the corresponding schema, and validate the content against it.
An ASDF library may also use this information to convert to a native data type
that presents a more convenient interface to the user than the structure of
basic types stored in the YAML content.
For example::
%YAML 1.1
--- !<tag:stsci.edu:asdf/core/asdf-1.1.0>
data: !<tag:stsci.edu:asdf/core/ndarray-1.0.0>
source: 0
datatype: float64
shape: [1024, 1024]
byteorder: little
...
All tags defined in the ASDF standard itself begin with the prefix
``tag:stsci.edu:asdf/``. This can be broken down as:
- ``tag:`` The standard prefix used for all YAML tags.
- ``stsci.edu`` The owner of the tag.
- ``asdf`` The name of the standard.
Following that is the "module" containing the schema (see
:ref:`schema` for a list of the available modules). Lastly is the tag
name itself, for example, ``asdf`` or ``ndarray``. Since it is
cumbersome to type out these long prefixes for every tag, it is
recommended that ASDF files declare a prefix at the top of the YAML
file and use it throughout. (Most standard YAML writing libraries
have facilities to do this automatically.) For example, the following
example is equivalent to the above example, but is more user-friendly.
The ``%TAG`` declaration declares that the exclamation point (``!``)
will be replaced with the prefix ``tag:stsci.edu:asdf/``::
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
data: !core/ndarray-1.0.0
source: 0
datatype: float64
shape: [1024, 1024]
byteorder: little
An ASDF parser may use the tag to look up the corresponding schema in
the ASDF standard and validate the element. The schema definitions
ship as part of the ASDF standard.
An ASDF parser may also use the tag information to convert the element
to a native data type. For example, in Python, an ASDF parser may
convert a :ref:`ndarray <core/ndarray-1.0.0>` tag to a `Numpy
<http://www.numpy.org>`__ array instance, providing a convenient and familiar
interface to the user to access *n*-dimensional data.
The ASDF standard does not require parser implementations to validate
or perform native type conversion, however. A parser may simply leave
the tree represented in the low-level basic data structures. When
writing an ASDF file, however, the elements in the tree must be
appropriately tagged for other tools to make use of them.
ASDF parsers must not fail when encountering an unknown tag, but must
simply retain the low-level data structure and the presence of the
tag. This is important, as end users will likely want to store their
own custom tags in ASDF files alongside the tags defined in the ASDF
standard itself, and the file must still be readable by ASDF parsers
that do not understand those tags.
.. _references:
References
----------
It is possible to directly reference other items within the same tree
or within the tree of another ASDF file. This functionality is based
on two IETF standards: `JSON Pointer (IETF RFC 6901)
<http://tools.ietf.org/html/rfc6901>`__ and `JSON Reference (Draft 3)
<http://tools.ietf.org/html/draft-pbryan-zyp-json-ref-03>`__.
A reference is represented as a mapping (dictionary) with a single
key/value pair. The key is always the special keyword ``$ref`` and the
value is a URI. The URI may contain a fragment (the part following
the ``#`` character) in JSON Pointer syntax that references a specific
element within the external file. This is a ``/``-delimited path
where each element is a mapping key or an array index. If no fragment
is present, the reference refers to the top of the tree.
.. note::
JSON Pointer is a very simple convention. The only wrinkle is that
because the characters ``'~'`` (0x7E) and ``'/'`` (0x2F) have
special meanings, ``'~'`` needs to be encoded as ``'~0'`` and
``'/'`` needs to be encoded as ``'~1'`` when these characters
appear in a reference token.
When these references are resolved, this mapping should be treated as
having the same logical content as the target of the URI, though the
exact details of how this is performed is dependent on the
implementation, i.e., a library may copy the target data into the
source tree, or it may insert a proxy object that is lazily loaded at
a later time.
For example, suppose we had a given ASDF file containing some shared
reference data, available on a public webserver at the URI
``http://www.nowhere.com/reference.asdf``::
wavelengths:
- !core/ndarray
source: 0
shape: [256, 256]
datatype: float
byteorder: little
Another file may reference this data directly::
reference_data:
$ref: "http://www.nowhere.com/reference.asdf#/wavelengths/0"
It is also possible to use references within the same file::
data: !core/ndarray
source: 0
shape: [256, 256]
datatype: float
byteorder: little
mask:
$ref: "#/my_mask"
my_mask: !core/ndarray
source: 0
shape: [256, 256]
datatype: uint8
byteorder: little
Reference resolution should be performed *after* the entire tree is
read, therefore forward references within the same file are explicitly
allowed.
.. note::
The YAML 1.1 standard itself also provides a method for internal
references called "anchors" and "aliases". It does not, however,
support external references. While ASDF does not explicitly
disallow YAML anchors and aliases, since it explicitly supports
all of YAML 1.1, their use is discouraged in favor of the more
flexible JSON Pointer/JSON Reference standard described above.
.. _numeric-literals:
Numeric literals
----------------
Integers represented as string literals in the ASDF tree must be no more than
64-bits. Due to :class:`~numpy.ndarray` types in
:ref:`Numpy <numpy:numpy_docs_mainpage>`, this is further restricted to
ranges defined for signed 64-bit integers (int64), not unsigned 64-bit integers
(uint64).
.. _tree-comments:
Comments
--------
It is quite common in FITS files to see comments that describe the
purpose of the key/value pair. For example::
DATE = '2015-02-12T23:08:51.191614' / Date this file was created (UTC)
TACID = 'NOAO ' / Time granting institution
Bringing this convention over to ASDF, one could imagine::
# Date this file was created (UTC)
creation_date: !time/utc
2015-02-12T23:08:51.191614
# Time granting institution
time_granting_institution: NOAO
It should be obvious from the examples that these kinds of comments,
describing the global meaning of a key, are much less necessary in
ASDF. Since ASDF is not limited to 8-character keywords, the keywords
themselves can be much more descriptive. But more importantly, the
schema for a given key/value pair describes its purpose in detail.
(It would be quite straightforward to build a tool that, given an
entry in a YAML tree, looks up the schema's description associated
with that entry.) Therefore, the use of comments to describe the
global meaning of a value are strongly discouraged.
However, there still may be cases where a comment may be desired in
ASDF, such as when a particular value is unusual or unexpected. The
YAML standard includes a convention for comments, providing a handy
way to include annotations in the ASDF file::
# We set this to filter B here, even though C is the more obvious
# choice, because B is handled with more accuracy by our software.
filter:
type: B
Unfortunately, most YAML parsers will simply throw these comments out
and do not provide any mechanism to retain them, so reading in an ASDF
file, making some changes, and writing it out will remove all
comments. Even if the YAML parser could be improved or extended to
retain comments, the YAML standard does not define which values the
comments are associated with. In the above example, it is only by
standard reading conventions that we assume the comment is associated
with the content following it. If we were to move the content, where
should the comment go?
To provide a mechanism to add user comments without swimming upstream
against the YAML standard, we recommend a convention for associating
comments with objects (mappings) by using the reserved key name
``//``. In this case, the above example would be rewritten as::
filter:
//: |
We set this to filter B here, even though C was used, because B
is handled with more accuracy by our software.
type: B
ASDF parsers must not interpret or react programmatically to these
comment values: they are for human reference only. No schema may
use ``//`` as a meaningful key.
Null values
-----------
YAML permits serialization of null values using the ``null`` literal::
some_key: null
Previous versions of the ASDF Standard were vague as to how nulls should
be handled, and the Python reference implementation did not distinguish
between keys with null values and keys that were missing altogether (and
in fact, removed any keys assigned ``None`` from the tree on read or
write). Beginning with ASDF Standard 1.6.0, ASDF implementations
are required to preserve keys even if assigned null values. This
requirement does not extend back into previous versions, and users
of the Python implementation should be advised that the YAML portion
of a < 1.6.0 ASDF file containing null values may be modified in unexpected
ways when read or written.
|