File: resampling.rst

package info (click to toggle)
pyresample 1.35.0-1
links: PTS, VCS
area: main
in suites: forky, sid
size: 34,800 kB
sloc: python: 20,340; cpp: 463; makefile: 105
file content (188 lines) | stat: -rw-r--r-- 9,531 bytes
parent folder | download | duplicates (2)
Resampling
==========

In the general sense, resampling is the process of creating new data points
from data we already have. In Pyresample, we use the term "resample" when we
use the data from one :doc:`geometry <geometries>` to create data points in
another geometry. With
the available geometries in mind we typically use resampling in a few common
use cases:

* swath to area: Or in another way, resampling ungridded data to a grid.
  This is useful for many reasons. Most areas are easy to visualize and
  provide a view of the Earth that is somewhat recognizable. This isn't
  always true of swaths. This resampling has all the other benefits of
  areas too such as being easier to describe the geolocation and easier
  to compare with other data. This type of resampling also allows getting
  a subset of the swath or changing the spatial pixel resolution to something
  more suited for your use case.
* area/swath to swath: When combining or comparing data from multiple sources
  and both of them happen to be ungridded swaths, resampling can be used
  to bring them on to the same coordinate system.
* area to area: Similar to resampling a swath to an area, we may want to get
  a subset of the input data or change the resolution so that it suits
  our needs. There is also the opportunity to change projections or to match
  the area for another dataset for easier comparison.

In all of these cases we need some way to determine what value to set for
each output pixel given one or more input pixels. For this there are many
algorithms to choose from. The algorithms implemented in Pyresample are
described below.

For more information on how to do resampling with
Pyresample see some of the :doc:`/howtos/index` such as :doc:`/howtos/swath`
for swath data or :doc:`/howtos/grid` for areas.

Algorithms
----------

When resampling we have a lot of options for mapping input pixels to an output
pixels. Depending on the algorithm we choose we may get better looking results
than others, but at the cost of more processing time or larger memory
requirements. Below is a basic description of some of the algorithms
implemented in Pyresample, but for a more detailed description see the
:doc:`API documentation </api/pyresample>` where the low-level resampler classes
and functions are documented.

Note that unless otherwise stated, the algorithms implemented in Pyresample
consider all pixels as individual points located at their centers. This means
they may generally ignore the shape of the real world footprint and possible
overlap of these footprints.

.. warning::

   Pyresample's resampling interfaces are currently undergoing changes as part
   of the version 2.0 redesign. As such, the existing algorithms described
   below may only be available from specific interfaces and may or may not be
   consistent with the interfaces of other algorithms. Some algorithms, at the
   time of writing, may only support data types needed to work with
   Pyresample's sibling project "Satpy". For example, some may expect Xarray
   ``DataArray`` classes with dask arrays underneath and will fail in ugly ways
   if given regular numpy arrays.

Nearest Neighbor
^^^^^^^^^^^^^^^^

Nearest neighbor is one of the simplest ways of mapping input pixels to output
pixels. It
sets every output pixel to the value of the input pixel that is geographically
closest to the output pixel's location. In Pyresample this is implemented using
a `k-d tree <https://en.wikipedia.org/wiki/K-d_tree>`_ via the
`pykdtree package <https://github.com/storpipfugl/pykdtree>`_. This structure
is created using the input geolocation and allows for quick querying for the
index of the nearest input pixel. Using these indexes we can create an output
array the size of the target geometry using a basic numpy indexing operation.
This is normally a very fast operation.

Any pixel in the output array that doesn't have a neighboring input pixel will
be assigned some fill value, typically the special floating point NaN. The
algorithm decides how far to look by a "radius of influence". Regardless of
the radius of influence, the result will always be the nearest neighboring
pixel if there is one within the radius, otherwise the fill value will be
used.

One good part about this algorithm and its simplicity is that it works with
any pair of input and output geometries (swath, area, etc) whether the pixels
are topology preserving or not. This is not
necessarily true for all algorithms.
This is the oldest algorithm implemented in pyresample and has been adapted in
different interfaces to support numpy, dask, and xarray DataArrays.

Caching
*******

Due to the way this algorithm uses the k-d tree and the indexes generated by
querying it, the indexes can be easily cached. Since the kd-tree only uses
input geolocation for creation and output geolocation for querying, we can
reuse the generated indexes for any input data sharing the same geolocation.
This can increase performance greatly by not having to regenerate or requery
the k-d tree. If our swath or area-based input is the same between executions
we can also write these indexes to disk and use them in the next execution
without having to generate or query a k-d tree.

Gradient Search
^^^^^^^^^^^^^^^

The gradient search algorithm is based on work by Alexander Trishchenko
(see :doi:`10.1109/TGRS.2008.916633`). The algorithm benefits from its
assumption that an array representation preserves the topology of the observed
data. By only looking
at nearby pixels the algorithm can efficiently compute a nearest neighbor
or perform a bilinear interpolation.

For the interfaces that exist (see warning below) and with the data that meet
the limitations of this algorithm (topology preserving, etc), this algorithm
should be
faster and more memory efficient for nearest neighbor than the k-d tree
approach (see above) and the other bilinear algorithm implemented in
Pyresample (see below). The algorithm does not currently support caching, but
also doesn't need to due to its speed.

.. warning::

   This resampling algorithm is still considered experimental. At the time of
   writing it only supports area to area resampling and requires xarray
   DataArray objects backed by dask arrays.

Bilinear
^^^^^^^^

Pyresample also offers a standalone bilinear algorithm that existed before
gradient search. It is based on the same k-d tree as the nearest neighbor
algorithm described above. Due to its use of the k-d tree it is able to handle
arrays that do not preserve the geographic topology of the data. It is
currently limited to xarray DataArray with
dask arrays as inputs. The current implementation currently requires getting
multiple nearby neighbors for every output pixel and then doing a bilinear
interpolation between the four nearest surrounding pixels. This typically
uses a lot of CPU and memory.
For topology preserving data, it is recommended to use the gradient search
algorithm.

Bucket
^^^^^^

The bucket resampling algorithm is actually multiple algorithms following a
similar structure. Bucket resampling is used to compute various types of
statistics about the input data falling within an output pixel (the "bucket").
These statistics include sum, min, max, count, average, and fraction of each
category for integer category data. Due to the possible differences
between geometries (ex. projections, pixel resolution, etc), an output pixel
might overlap with zero or more input pixels. Instead of only getting the
nearest input pixel (nearest neighbor), bucket resampling allows us to get the
maximum input value, or the minimum, or the average, or any other implementated
calculation. This allows for more control over the final output that may be more
useful or accurate depending on the type of data being worked with.

The current implementation is limited to xarray DataArrays and dask arrays.

Elliptical Weighted Averaging
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pyresample makes it possible to resample swath data to a uniform grid
using an Elliptical Weighted Averaging algorithm or EWA for short.
This algorithm behaves differently than the KDTree based resampling
algorithms. The KDTree-based algorithms
process each output grid pixel by searching for all "nearby" input
pixels and applying a certain interpolation (nearest neighbor, gaussian, etc).
The EWA algorithm processes each input pixel mapping it to one or more output
pixels. Once each input pixel has been analyzed, the intermediate results are
averaged to produce the final gridded result.

The EWA algorithm also has limitations on how the input data are structured
compared to the generic KDTree algorithms. EWA assumes that data in the array
is organized geographically; adjacent data in the array is adjacent data
geographically. The algorithm uses this to configure parameters based on the
size and location of the swath pixels. It also assumes that data are
scan-based, recorded by a orbiting satellite scan by scan, and the user must
provide scan size with the ``rows_per_scan`` option.

The EWA algorithm consists of two
steps: ll2cr and fornav. The algorithm was originally part of the
MODIS Swath to Grid Toolbox (ms2gt) created by the
NASA National Snow & Ice Data Center (NSIDC). Its default parameters
work best with MODIS L1B data, but it has been proven to produce high
quality images from VIIRS and AVHRR data with the right parameters.

There are multiple high-level interfaces to this algorithm in order to support
for numpy arrays or xarray DataArrays backed by dask arrays.