File: satpy_internals.rst

package info (click to toggle)
satpy 0.59.0-2
links: PTS, VCS
area: main
in suites: forky, sid
size: 39,296 kB
sloc: python: 93,630; xml: 3,343; makefile: 146; javascript: 23
file content (158 lines) | stat: -rw-r--r-- 6,856 bytes
======================================================
 Satpy internal workings: having a look under the hood
======================================================

Querying and identifying data arrays
====================================

DataQuery
---------

The loading of data in Satpy is usually done through giving the name or the wavelength of the data arrays we are interested
in. This way, the highest, most calibrated data arrays is often returned.

However, in some cases, we need more control over the loading of the data arrays. The way to accomplish this is to load
data arrays using queries, eg::

  scn.load([DataQuery(name='channel1', resolution=400)]

Here a data array with name `channel1` and of resolution `400` will be loaded if available.

Note that None is not a valid value, and keys having a value set to None will simply be ignored.

If one wants to use wildcards to query data, just provide `'*'`, eg::

  scn.load([DataQuery(name='channel1', resolution=400, calibration='*')]

Alternatively, one can provide a list as parameter to query data, like this::

  scn.load([DataQuery(name='channel1', resolution=[400, 800])]



DataID
------

Satpy stores loaded data arrays in a special dictionary (`DatasetDict`) inside scene objects.
In order to identify each data array uniquely, Satpy is assigning an ID to each data array, which is then used as the key in
the scene object. These IDs are of type `DataID` and are immutable. They are not supposed to be used by regular users and should only be
created in special circumstances. Satpy should take care of creating and assigning these automatically. They are also stored in the
`attrs` of each data array as `_satpy_id`.

Default and custom metadata keys
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One thing however that the user has control over is which metadata keys are relevant to which datasets. Satpy provides two default sets
of metadata key (or ID keys), one for regular imager bands, and the other for composites.
The first one contains: name, wavelength, resolution, calibration, modifiers.
The second one contains: name, resolution.

As an example here is the definition of the first one in yaml:

  .. code-block:: yaml

    data_identification_keys:
      name:
        required: true
      wavelength:
        type: !!python/name:satpy.dataset.dataid.WavelengthRange
      resolution:
      calibration:
        enum:
            - reflectance
            - brightness_temperature
            - radiance
            - counts
        transitive: true
      modifiers:
        required: true
        default: []
        type: !!python/name:satpy.dataset.ModifierTuple

To create a new set, the user can provide indications in the relevant yaml file.
It has to be provided in header of the reader configuration file, under the `reader`
section, as `data_identification_keys`. Each key under this is the name of relevant
metadata key that will used to find relevant information in the attributes of the data
arrays. Under each of this, a few options are available:

 - `required`: if the item is required, False by default
 - `type`: the type to use. More on this further down.
 - `enum`: if the item has to be limited to a finite number of options, an enum can be used.
   Be sure to place the options in the order of preference, with the most desirable option on top.
 - `default`: the default value to assign to the item if nothing (or None) is provided. If this
   option isn't provided, the key will simply be omitted if it is not present in the attrs or if it
   is None. It will be passed to the type's `convert` method if available.
 - `transitive`: whether the key is to be passed when looking for dependencies of composites/modifiers.
   Here for example, a composite that has in a given calibration type will pass this calibration
   type requirement to its dependencies.


If the definition of the metadata keys need to be done in python rather than in a yaml file, it will
be a dictionary very similar to the yaml code. Here is the same example as above in python:

  .. code-block:: python

    from satpy.dataset.dataid import WavelengthRange, ModifierTuple

    id_keys_config = {'name': {
                          'required': True,
                      },
                      'wavelength': {
                          'type': WavelengthRange,
                      },
                      'resolution': None,
                      'calibration': {
                          'enum': [
                              'reflectance',
                              'brightness_temperature',
                              'radiance',
                              'counts'
                              ],
                          'transitive': True,
                      },
                      'modifiers': {
                          'required': True,
                          'default': ModifierTuple(),
                          'type': ModifierTuple,
                      },
                      }

Types
~~~~~
Types are classes that implement a type to be used as value for metadata in the `DataID`. They have
to implement a few methods:

 - a `convert` class method that returns it's argument as an instance of the class
 - `__hash__`, `__eq__` and `__ne__` methods
 - a `distance` method the tells how "far" an instance of this class is from it's argument.

An example of such a class is the :class:`WavelengthRange <satpy.dataset.dataid.WavelengthRange>` class.
Through its implementation, it allows us to use the wavelength in a query to find out which of the
DataID in a list which has its central wavelength closest to that query for example.


DataID and DataQuery interactions
=================================

Different DataIDs and DataQuerys can have different metadata items defined. As such
we define equality between different instances of these classes, and across the classes
as equality between the sorted key/value pairs shared between the instances.
If a DataQuery has one or more values set to `'*'`, the corresponding key/value pair will be omitted from the comparison.
Instances sharing no keys will no be equal.


Breaking changes from DatasetIDs
================================

 - The way to access values from the DataID and DataQuery is through getitem: `my_dataid['resolution']`
 - For checking if a dataset is loaded, use `'mydataset' in scene`, as `'mydataset' in scene.keys()` will always return `False`:
   the `DatasetDict` instance only supports `DataID` as key type.

Creating DataID for tests
=========================

Sometimes, it is useful to create `DataID` instances for testing purposes. For these cases, the `satpy.tests.utils` module
now has a `make_dsid` function that can be used just for this::

  from satpy.tests.utils import make_dataid
  did = make_dataid(name='camembert', modifiers=('runny',))