File: resources.rst

package info (click to toggle)
python-asdf 4.3.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 7,032 kB
  • sloc: python: 24,068; makefile: 123
file content (261 lines) | stat: -rw-r--r-- 9,183 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
.. currentmodule:: asdf.resource

.. _extending_resources:

===============================
Resources and resource mappings
===============================

In the terminology of this library, a "resource" is a sequence of bytes associated
with a URI.  Currently the two types of resources recognized by `asdf` are schemas
and extension manifests.  Both of these are YAML documents whose associated URI
is expected to match the ``id`` property of the document.

A "resource mapping" is an `asdf` plugin that provides access to
the content for a URI.  These plugins must implement the
`~collections.abc.Mapping` interface (a simple `dict` qualifies) and map
`str` URI keys to `bytes` values.  Resource mappings are installed into
the `asdf` library via one of two routes: the `AsdfConfig.add_resource_mapping <asdf.config.AsdfConfig.add_resource_mapping>`
method or the ``asdf.resource_mappings`` entry point.

Installing resources via AsdfConfig
===================================

The simplest way to install a resource into `asdf` is to add it at runtime using the
`AsdfConfig.add_resource_mapping <asdf.config.AsdfConfig.add_resource_mapping>` method.
For example, the following code installs a schema for use with the `asdf.AsdfFile`
custom_schema argument:

.. code-block:: python

    import asdf

    content = b"""
    %YAML 1.1
    ---
    $schema: http://stsci.edu/schemas/yaml-schema/draft-01
    id: asdf://example.com/example-project/schemas/foo-1.0.0
    type: object
    properties:
      foo:
        type: string
    required: [foo]
    ...
    """

    asdf.get_config().add_resource_mapping(
        {"asdf://example.com/example-project/schemas/foo-1.0.0": content}
    )

The schema will now be available for validating files:

.. code-block:: python

    af = asdf.AsdfFile(custom_schema="asdf://example.com/example-project/schemas/foo-1.0.0")
    af.validate()  # Error, "foo" is missing

The DirectoryResourceMapping class
==================================

But what if we don't want to store our schemas in variables in the code?
Storing resources in a directory tree is a common use case, so `asdf` provides
a `~collections.abc.Mapping` implementation that reads schema content from
a filesystem.  This is the `DirectoryResourceMapping` class.

Consider these three schemas:

.. code-block:: yaml

    # foo-1.0.0.yaml
    id: asdf://example.com/example-project/schemas/foo-1.0.0
    # ...

    # bar-2.3.4.yaml
    id: asdf://example.com/example-project/nested/bar-2.3.4
    # ...

    # baz-8.1.1.yaml
    id: asdf://example.com/example-project/nested/baz-8.1.1
    # ...

which are arranged in the following directory structure::

    schemas
    ├─ foo-1.0.0.yaml
    ├─ README
    └─ nested
        ├─ bar-2.3.4.yaml
        └─ baz-8.1.1.yaml

Our goal is to install all schemas in the directory tree so that they
are available for use with `asdf`.  The `DirectoryResourceMapping` class
can do that for us, but we need to show it how to construct the schema URIs
from the file paths *without reading the id property from the files*.  This
requirement is a performance consideration; not all resources are used
in every session, and if `asdf` were to read and parse all available files
when plugins are loaded, the first call to `asdf.open` would be
intolerably slow.

We should configure `DirectoryResourceMapping` like this:

.. code-block:: python

    import asdf
    from asdf.resource import DirectoryResourceMapping

    mapping = DirectoryResourceMapping(
        "/path/to/schemas",
        "asdf://example.com/example-project/schemas/",
        recursive=True,
        filename_pattern="*.yaml",
        stem_filename=True,
    )

    asdf.get_config().add_resource_mapping(mapping)

The first argument is the path to the schemas directory on the filesystem.  The
second argument is the prefix that should be prepended to file paths
relative to that root when constructing the schema URIs.  The ``recursive``
argument tells the class to descend into the ``nested`` directory when
searching for schemas, ``filename_pattern`` is a glob pattern chosen to exclude
our README file, and ``stem_filename`` causes the class to drop the ``.yaml`` suffix
when constructing URIs.

We can test that our configuration is correct by asking `asdf` to read
and parse one of the schemas:

.. code-block:: python

    from asdf.schema import load_schema

    uri = "asdf://example.com/example-project/schemas/nested/bar-2.3.4.yaml"
    schema = load_schema(uri)
    assert schema["id"] == uri

.. _extending_resources_entry_points:

Installing resources via entry points
=====================================

The `asdf` package also offers an entry point for installing resource
mapping plugins.  This installs a package's resources automatically
without requiring calls to the AsdfConfig method.  The entry point is
called ``asdf.resource_mappings`` and expects to receive
a method that returns a list of `~collections.abc.Mapping` instances.

For example, let's say we're creating a package named ``asdf-foo-schemas``
that provides the same schemas described in the previous section.  Our
directory structure might look something like this::

    asdf-foo-schemas
    ├─ pyproject.toml
    └─ src
       └─ asdf_foo_schemas
          ├─ __init__.py
          ├─ integration.py
          └─ schemas
             ├─ __init__.py
             ├─ foo-1.0.0.yaml
             ├─ README
             └─ nested
                ├─ __init__.py
                ├─ bar-2.3.4.yaml
                └─ baz-8.1.1.yaml

``pyproject.toml`` is the preferred central configuration file for Python build and development systems.
However, it is also possible to write configuration to a ``setup.cfg`` file (used by
`setuptools <https://setuptools.pypa.io/en/latest/index.html>`_) placed in the root
directory of the project. This documentation will cover both options.

In ``integration.py``, we'll define the entry point method
and have it return a list with a single element, our `DirectoryResourceMapping`
instance:

.. code-block:: python

    # integration.py
    from pathlib import Path

    from asdf.resource import DirectoryResourceMapping


    def get_resource_mappings():
        # Get path to schemas directory relative to this file
        schemas_path = Path(__file__).parent / "schemas"
        mapping = DirectoryResourceMapping(
            schemas_path,
            "asdf://example.com/example-project/schemas/",
            recursive=True,
            filename_pattern="*.yaml",
            stem_filename=True,
        )
        return [mapping]

Then in ``pyproject.toml``, define an ``[project.entry-points]`` section (or ``[options.entry_points]`` in ``setup.cfg``)
that identifies the method as an ``asdf.resource_mappings`` entry point:

.. tab:: pyproject.toml

    .. code-block:: toml

        [project.entry-points]
        'asdf.resource_mappings' = { asdf_foo_schemas = 'asdf_foo_schemas.integration:get_resource_mappings' }

.. tab:: setup.cfg

    .. code-block:: ini

        [options.entry_points]
        asdf.resource_mappings =
            asdf_foo_schemas = asdf_foo_schemas.integration:get_resource_mappings

After installing the package, it should be possible to load one of our schemas
in a new session without any additional setup:

.. code-block:: python

    from asdf.schema import load_schema

    uri = "asdf://example.com/example-project/schemas/nested/bar-2.3.4.yaml"
    schema = load_schema(uri)
    assert schema["id"] == uri

Note that the package will need to be configured to include the
YAML files.  There are multiple ways to accomplish this, but one easy option
is to add ``[tool.setuptools.package-data]`` and ``[tool.setuptools.package-dir]`` sections to ``pyproject.toml``
(or ``[options.package_data]`` in ``setup.cfg``) requesting that all files with a ``.yaml`` extension be installed:

.. tab:: pyproject.toml

    .. code-block:: toml

        [tool.setuptools]
        packages = ["asdf_foo_schemas", "asdf_foo_schemas.resources"]

        [tool.setuptools.package-data]
        "asdf_foo_schemas.resources" = ["resources/**/*.yaml"]

        [tool.setuptools.package-dir]
        "" = "src"
        "asdf_foo_schemas.resources" = "resources"

.. tab:: setup.cfg

    .. code-block:: ini

        [options.package_data]
        * = *.yaml

Entry point performance considerations
--------------------------------------

For the good of `asdf` users everywhere, it's important that entry point
methods load as quickly as possible.  All resource URIs must be loaded
before reading an ASDF file, so any entry point method that lingers
will introduce a delay to the initial call to `asdf.open`.  For that reason,
we recommend to minimize the number of imports that occur in the module
containing the entry point method, particularly imports of modules outside
of the Python standard library or `asdf` itself.  When resources are stored
in a filesystem, it's also helpful to delay reading a file until its URI
is actually requested, which may not occur in a given session.  The
DirectoryResourceMapping class is implemented with this behavior.