File: cases.rst

package info (click to toggle)
kerchunk 0.2.9-2
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 135,172 kB
  • sloc: python: 6,477; makefile: 39
file content (77 lines) | stat: -rw-r--r-- 2,540 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
Case studies
============

Here we list completed datasets, with the reproducible code that made them, link to the
created references and possibly notebook/benchmark examples. This page is a work in progress.
All datasets available here will also be listed in the repo Intake catalogue.

.. note::

   This page needs to be cleaned up and the cases standardized.

Sentinel Global coherence
-------------------------

Native data format: GeoTIFF.

Effective in-memory size: 400TB.

Documentation: http://sentinel-1-global-coherence-earthbigdata.s3-website-us-west-2.amazonaws.com

Discussion: https://github.com/fsspec/kerchunk/issues/78

Generator script: https://github.com/cgohlke/tifffile/blob/v2021.10.10/examples/earthbigdata.py

Notebook: https://nbviewer.org/github/fsspec/kerchunk/blob/main/examples/earthbigdata.ipynb

Solar Dynamics Observatory
--------------------------

Native data format: FITS.

Effective in-memory data size: 400GB

Notes: each wavelength filter is presented as a separate variable. The DATE-OBS of the nearest preceding 94A image
is used for other filters to maintain a single time axis for all variables.

Notebook: https://nbviewer.org/github/fsspec/kerchunk/blob/main/examples/SDO.ipynb

National Water Model
--------------------

Native data format: NetCDF4/HDF5.

Effective in-memory size: 80TB

Notes: there are so many files, that dask and a tee reduction were required to aggregate the
metadata.

Generator notebook: https://nbviewer.org/gist/rsignell-usgs/ef435a53ac530a2843ce7e1d59f96e22

Notebook: https://nbviewer.org/gist/rsignell-usgs/02da7d9257b4b26d84d053be1af2ceeb

MUR SST
-------

Native data format: NetCDF4/HDF5. Effective in-memory size: 66TB. On disk size: 16TB

Documentation: https://podaac.jpl.nasa.gov/dataset/MUR-JPL-L4-GLOB-v4.1

Notebook: https://nbviewer.org/github/cgentemann/cloud_science/blob/master/zarr_meta/cloud_mur_v41_benchmark.ipynb

Notes: Global sea surface temperature data.  The notebook includes benchmarks.
See the notebook for how to establish NASA Earthdata credentials necessary for data access.

HRRR
----

Native format: GRIB2.

Effective in-memory size: 1.5GB (11-file subset)

Documentation: https://rapidrefresh.noaa.gov/hrrr/

Notebook (generation and use): https://nbviewer.org/gist/peterm790/92eb1df3d58ba41d3411f8a840be2452

Notes: High-Resolution Rapid Refresh, real-time 3-km resolution, hourly updated, cloud-resolving,
convection-allowing atmospheric model from NOAA.  Notebook extracts only sections matching the filter "heightAboveGround=2".