File: remote_reading.rst

package info (click to toggle)
satpy 0.59.0-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 39,296 kB
  • sloc: python: 93,630; xml: 3,343; makefile: 146; javascript: 23
file content (150 lines) | stat: -rw-r--r-- 4,990 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
====================
Reading remote files
====================

Using a single reader
=====================

Some of the readers in Satpy can read data directly over various transfer protocols. This is done
using `fsspec <https://filesystem-spec.readthedocs.io/en/latest/index.html>`_ and various packages
it is using underneath.

As an example, reading ABI data from public AWS S3 storage can be done in the following way::

    from satpy import Scene

    storage_options = {'anon': True}
    filenames = ['s3://noaa-goes16/ABI-L1b-RadC/2019/001/17/*_G16_s20190011702186*']
    scn = Scene(reader='abi_l1b', filenames=filenames, reader_kwargs={'storage_options': storage_options})
    scn.load(['true_color_raw'])

Reading from S3 as above requires the `s3fs` library to be installed in addition to `fsspec`.

As an alternative, the storage options can be given using
`fsspec configuration <https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration>`_.
For the above example, the configuration could be saved to `s3.json` in the `fsspec` configuration directory
(by default placed in `~/.config/fsspec/` directory in Linux)::

    {
        "s3": {
            "anon": "true"
        }
    }

.. note::

    Options given in `reader_kwargs` override only the matching options given in configuration file and everythin else is left
    as-is. In case of problems in data access, remove the configuration file to see if that solves the issue.


For reference, reading SEVIRI HRIT data from a local S3 storage works the same way::

    filenames = [
        's3://satellite-data-eumetcast-seviri-rss/H-000-MSG3*202204260855*',
    ]
    storage_options = {
        "client_kwargs": {"endpoint_url": "https://PLACE-YOUR-SERVER-URL-HERE"},
        "secret": "VERYBIGSECRET",
        "key": "ACCESSKEY"
    }
    scn = Scene(reader='seviri_l1b_hrit', filenames=filenames, reader_kwargs={'storage_options': storage_options})
    scn.load(['WV_073'])

Using the `fsspec` configuration in `s3.json` the configuration would look like this::

    {
        "s3": {
            "client_kwargs": {"endpoint_url": "https://PLACE-YOUR-SERVER-URL-HERE"},
            "secret": "VERYBIGSECRET",
            "key": "ACCESSKEY"
        }
    }


Using multiple readers
======================

If multiple readers are used and the required credentials differ, the storage options are passed per reader like this::

    reader1_filenames = [...]
    reader2_filenames = [...]
    filenames = {
        'reader1': reader1_filenames,
        'reader2': reader2_filenames,
    }
    reader1_storage_options = {...}
    reader2_storage_options = {...}
    reader_kwargs = {
        'reader1': {
            'option1': 'foo',
            'storage_options': reader1_storage_options,
        },
        'reader2': {
            'option1': 'foo',
            'storage_options': reader1_storage_options,
        }
    }
    scn = Scene(filenames=filenames, reader_kwargs=reader_kwargs)


Caching the remote files
========================

Caching the remote file locally can speedup the overall processing time significantly, especially if the data are re-used
for example when testing. The caching can be done by taking advantage of the `fsspec caching mechanism
<https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally>`_::

    reader_kwargs = {
        'storage_options': {
            's3': {'anon': True},
            'simplecache': {
                'cache_storage': '/tmp/s3_cache',
            }
        }
    }

    filenames = ['simplecache::s3://noaa-goes16/ABI-L1b-RadC/2019/001/17/*_G16_s20190011702186*']
    scn = Scene(reader='abi_l1b', filenames=filenames, reader_kwargs=reader_kwargs)
    scn.load(['true_color_raw'])
    scn2 = scn.resample(scn.coarsest_area(), resampler='native')
    scn2.save_datasets(base_dir='/tmp/', tiled=True, blockxsize=512, blockysize=512, driver='COG', overviews=[])


The following table shows the timings for running the above code with different cache statuses::

.. _cache_timing_table:

.. list-table:: Processing times without and with caching
    :header-rows: 1
    :widths: 40 30 30

    * - Caching
      - Elapsed time
      - Notes
    * - No caching
      - 650 s
      - remove `reader_kwargs` and `simplecache::` from the code
    * - File cache
      - 66 s
      - Initial run
    * - File cache
      - 13 s
      - Second run

.. note::

    The cache is not cleaned by Satpy nor fsspec so the user should handle cleaning excess files from `cache_storage`.


.. note::

    Only `simplecache` is considered thread-safe, so using the other caching mechanisms may or may not work depending
    on the reader, Dask scheduler or the phase of the moon.


Resources
=========

See :class:`~satpy.readers.core.remote.FSFile` for direct usage of `fsspec` with Satpy, and
`fsspec documentation <https://filesystem-spec.readthedocs.io/en/latest/index.html>`_ for more details on connection options
and detailes.