File: intro.rst

package info (click to toggle)
asdf-standard 1.1.1-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 1,020 kB
  • sloc: python: 1,050; makefile: 16
file content (120 lines) | stat: -rw-r--r-- 5,136 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
Introduction
============

The Flexible Image Transport System (FITS) has been the de facto
standard for storing and exchanging astronomical data for decades, but
it is beginning to show its age.  Developed in the late 1970s, the
FITS authors made a number of implementation choices that, while
common at the time, are now seen to limit its utility for the needs of
modern science.  As astronomy moves into a more varied set of data
product types (data models) with richer and more complex metadata,
FITS is being pushed to its breaking point.  The issues with FITS are
outlined in great detail in [Thomas2015]_.

Newer formats, such as `VOTable
<http://www.ivoa.net/documents/VOTable/>`__ have partially addressed
the problem of richer, more structured metadata, by using tree
structures rather than flat key/value pairs.  However, those
text-based formats are unsuitable for storing large amounts of binary
data.  On the other end of the spectrum, formats such as `HDF5
<http://www.hdfgroup.org/HDF5/>`__ and `BLZ
<http://blaze.pydata.org/>`__ address problems
with large data sets and distributed computing, but don't really
address the metadata needs of an interchange format.  ASDF aims to
exist in the same middle ground that made FITS so successful, by being
a hybrid text and binary format: containing human editable metadata
for interchange, and raw binary data that is fast to load and use.
Unlike FITS, the metadata is highly structured and is designed
up-front for extensibility.

ASDF has the following explicit goals:

- It has a hierarchical metadata structure, made up of basic dynamic
  data types such as strings, numbers, lists and mappings.

- It has human-readable metadata that can be edited directly in place
  in the file.

- The structure of the data can be automatically validated using
  schema.

- It's designed for extensibility: new conventions may be used without
  breaking backward compatibility with tools that do not understand
  those conventions.  Versioning systems are used to prevent
  conflicting with alternative conventions.

- The binary array data (when compression is not used) is a raw memory
  dump, and techniques such as memory mapping can be used to
  efficiently access it.

- It is possible to read and write the file in as a stream, without
  requiring random access.

- It's built on top of industry standards, such as `YAML
  <http://www.yaml.org>`__ and `JSON Schema
  <http://www.json-schema.org>`__ to take advantage of a larger
  community working on the core problems of data representation.  This
  also makes it easier to support ASDF in new programming languages
  and environments by building on top of existing libraries.

- Since every ASDF file has the version of the specification to which
  it is written, it will be possible, through careful planning, to
  evolve the ASDF format over time, allowing for files that use new
  features while retaining backward compatibility with older tools.

ASDF is primarily intended as an interchange format for delivering
products from instruments to scientists or between scientists.  While
it is reasonably efficient to work with and transfer, it may not be
optimal for direct use on large data sets in distributed and high
performance computing environments.  That is explicitly not a goal of
the ASDF standard, as those requirements can sometimes be at odds with
the needs of an interchange format.  ASDF still has a place in those
environments as a delivery mechanism, even if it ultimately is not the
actual format on which the computing is performed.

Implementations
---------------

The ASDF standard is being developed concurrently with a `reference
implementation written in Python
<http://github.com/spacetelescope/asdf>`__.

There are two prototype implementations for C++: `asdf-cpp
<https://github.com/spacetelescope/asdf-cpp>`__ and `asdf-cxx
<https://github.com/eschnett/asdf-cxx>`__. Neither is currently feature
complete, but both provide enough functionality to read and write ASDF files.

There is also a `work-in-progress wrapper
<https://github.com/eschnett/asdf.jl>`__ of the Python implementation
for `Julia <https://julialang.org>`__.

Incorporated standards
----------------------

The ASDF format is built on top of a number of existing standards:

- `YAML 1.1 <http://yaml.org/spec/1.1/>`__

- JSON Schema Draft 4:

  - `Core <http://tools.ietf.org/html/draft-zyp-json-schema-04>`__

  - `Validation
    <http://tools.ietf.org/html/draft-fge-json-schema-validation-00>`__

  - `Hyper-Schema
    <http://tools.ietf.org/html/draft-luff-json-hyper-schema-00>`__

- `JSON Pointer <http://tools.ietf.org/html/rfc6901>`__

- `Semantic Versioning 2.0.0 <http://semver.org/spec/v2.0.0.html>`__

- `VOUnits (Units in the VO)
  <http://www.ivoa.net/documents/VOUnits/index.html>`__

- `Zlib Deflate compression <http://www.zlib.net/feldspar.html>`__

.. [Thomas2015] Thomas, B., Jenness. T. et al. (2015).
   Learning from FITS: Limitations in use in modern astronomical research.
   *Astronomy and Computing*, 12: 133-145.
   `doi:10.1016/j.ascom.2015.01.009 <https://doi.org/10.1016/j.ascom.2015.01.009>`__