1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320
|
.. _nexus-format:
NeXus data format
-----------------
.. note::
To read this format, the optional dependency ``h5py`` is required.
Background
^^^^^^^^^^
`NeXus <https://www.nexusformat.org>`_ is a common data format originally
developed by the neutron and x-ray science x-ray communities. It is still being
developed as an international standard by scientists and programmers
representing major scientific facilities in order to facilitate greater
cooperation in the analysis and visualization of data.
NeXus uses a variety of classes to record data, values,
units and other experimental metadata associated with an experiment.
For specific types of experiments an Application Definition may exist, which
defines an agreed common layout that facilities can adhere to.
NeXus metadata and data are stored in Hierarchical Data Format Files (HDF5) with
a ``.nxs`` extension although standard HDF5 extensions are sometimes used.
.. note::
In `HyperSpy <https://hyperspy.org>`_, files must use the ``.nxs`` file
extension in order to default to the NeXus loader. If your file has
an HDF5 extension, you can also explicitly set the NeXus file reader:
.. code-block:: python
# Load a NeXus file with a .h5 extension
>>> import hyperspy.api as hs
>>> s = hs.load("filename.h5", reader="nxs")
The loader will follow version 3 of the
`NeXus data rules <https://manual.nexusformat.org/datarules.html#version-3>`_.
The signal type, Signal1D or Signal2D, will be inferred by the ``interpretation``
attribute, if this is set to ``spectrum`` or ``image``, in the ``NXdata``
description. If the `interpretation
<https://manual.nexusformat.org/design.html#design-attributes>`_ attribute is
not set, the loader will return a ``BaseSignal``, which must then be converted
to the appropriate signal type. Following the NeXus data rules, if a ``default``
dataset is not defined, the loader will load NXdata
and HDF datasets according to the keyword options in the reader.
A number of the `NeXus examples <https://github.com/nexusformat/exampledata>`_
from large facilties do not use NXdata or use older versions of the NeXus
implementation. Data can still be loaded from these files but information or
associations may be missing. However, this missing information can be recovered
from within the ``original_metadata`` which contains the overall structure of
the entry.
As the NeXus format uses the HDF5 format and needs to read both data and
metadata structured in different ways, the loader is written to be quite
flexible and can also be used to inspect any hdf5 based file.
Differences with respect to HSpy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The :external+hyperspy:ref:`HyperSpy metadata structure <metadata_structure>`
stores arrays as hdf datasets without attributes
and stores floats, ints and strings as attributes.
The NeXus format uses hdf dataset attributes to store additional
information such as an indication of the units for an axis or the ``NX_class`` which
the dataset structure follows. The metadata, HyperSpy or ``original_metadata``,
therefore needs to be able to indicate the values and attributes of a dataset.
To implement this structure the ``value`` and ``attrs`` of a dataset can also be
defined. The value of a dataset is set using a ``value`` key.
The attributes of a dataset are defined by an ``attrs`` key.
For example, to store an array called ``axis_x``, with a units attribute within
``original_metadata``, the following structure would be used:
::
├──original_metadata
│ ├── axis_x
│ │ ├── value : array([1.0,2.0,3.0,4.0,5.0])
│ │ ├── attrs
│ │ │ ├── units : mm
.. code-block:: python
>>> original_metadata.set_item(axis_x.value,[1.0,2.0,3.0,4.0,5.0])
>>> original_metadata.set_item(axis_x.attrs.units,"mm")
To access the axis information:
.. code-block:: python
>>> original_metadata.axis_x.value
>>> original_metadata.axis_x.attrs.units
To modify the axis information:
.. code-block:: python
>>> original_metadata.axis_x.value = [2.0,3.0,4.0,5.0,6.0]
>>> original_metadata.axis_x.attrs.units = "um"
To store data in a NeXus monochromator format, ``value``
and ``attrs`` keys can define additional attributes:
::
├── monochromator
│ ├── energy
│ │ ├── value : 12.0
│ │ ├── attrs
│ │ │ ├── units : keV
│ │ │ ├── NXclass : NXmonochromator
The ``attrs`` key can also be used to define NeXus structures for the definition
of structures and relationships between data:
::
├── mydata
│ ├── attrs
│ │ ├── NX_class : "NXdata"
│ │ ├── axes : ["x","."]
│ ├── data
│ │ ├──value : [[30,23...110]
│ ├── x
│ │ ├──value : [1,2.....100]
│ │ ├── attrs
│ │ │ ├── unit : "mm"
The use of ``attrs`` or ``value`` to set values within the metadata is optional
and metadata values can also be set, read or modified in the normal way.
.. code-block:: python
>>> original_metadata.monochromator.energy = 12.5
HyperSpy metadata is stored within the NeXus file and should be automatically
restored when a signal is loaded from a previously saved NeXus file.
.. note::
Altering the standard metadata structure of a signal
using ``attrs`` or ``value`` keywords is not recommended.
Also see the :ref:`hdf5-utils` for inspecting HDF5 files.
API functions
^^^^^^^^^^^^^
.. automodule:: rsciio.nexus
:members:
Reading examples
^^^^^^^^^^^^^^^^
NeXus files can contain multiple datasets within the same file, but the
ordering of datasets can vary depending on the setup of an experiment or
processing step when the data was collected.
For example, in one experiment Fe, Ca, P, Pb were collected but in the next experiment
Ca, P, K, Fe, Pb were collected. RosettaSciIO supports reading in one or more datasets
and returns a list of signals but in this example case the indexing is different.
To control which data or metadata is loaded and in what order
some additional loading arguments are provided.
.. note::
Given that HDF5 files can accommodate very large datasets, setting ``lazy=True``
is strongly recommended if the content of the HDF5 file is not known apriori.
This prevents issues with regard to loading datasets far larger than memory.
Also note that setting ``lazy=True`` leaves the file handle to the HDF5 file
open. In Hyperspy, it can with ``_signals.lazy.LazySignal.close_file``
or when using ``_signals.lazy.LazySignal.compute`` with ``close_file=True``.
We can load a specific dataset using the ``dataset_path`` keyword argument.
Setting it to the absolute path of the desired dataset will cause
the single dataset to be loaded:
.. code-block:: python
>>> from rsciio.nexus import file_reader
>>> # Loading a specific dataset
>>> file_reader("sample.nxs", dataset_path="/entry/experiment/EDS/data")
We can also choose to load datasets based on a search key using the
``dataset_key`` keyword argument. This can also be used to load NXdata not
outside of the ``default`` version 3 rules. Instead of providing an absolute
path, a string can be provided as well, and datasets with this key will be
returned. The previous example could also be written as:
.. code-block:: python
>>> # Loading datasets containing the string "EDS"
>>> file_reader("sample.nxs", dataset_key="EDS")
The difference between ``dataset_path`` and ``dataset_key`` is illustrated
here:
.. code-block:: python
>>> # Only the dataset /entry/experiment/EDS/data will be loaded
>>> file_reader("sample.nxs", dataset_path="/entry/experiment/EDS/data")
>>> # All datasets contain the entire string "/entry/experiment/EDS/data" will be loaded
>>> file_reader("sample.nxs", dataset_key="/entry/experiment/EDS/data")
Multiple datasets can be loaded by providing a number of keys:
.. code-block:: python
>>> # Loading a specific dataset
>>> file_reader("sample.nxs", dataset_key=["EDS", "Fe", "Ca"])
Metadata can also be filtered in the same way using ``metadata_key``:
.. code-block:: python
>>> # Load data with metadata matching metadata_key
>>> file_reader("sample.nxs", metadata_key="entry/instrument")
.. note::
The NeXus loader removes any NXdata blocks from the metadata.
Metadata that are arrays can be skipped by using ``skip_array_metadata``:
.. code-block:: python
>>> # Load data while skipping metadata that are arrays
>>> file_reader("sample.nxs", skip_array_metadata=True)
.. note::
NeXus files also support parameters or dimensions that have been varied
non-linearly. Since the reading of non-uniform axes is not yet implemented for the
NeXus plugin, such non-linear information would be lost in the axes manager and
replaced with indices.
.. note::
NeXus and HDF can result in large metadata structures with large datasets within the loaded
original_metadata. If lazy loading is used this may not be a concern but care must be taken
when saving the data. To control whether large datasets are loaded or saved,
use the ``metadata_key`` to load only the most relevant information. Alternatively,
set ``skip_array_metadata`` to ``True`` to avoid loading those large datasets in original_metadata.
Writing examples
^^^^^^^^^^^^^^^^
Using the ``file_writer`` method will store the NeXus file with the following structure:
::
├── entry1
│ ├── signal_name
│ │ ├── auxiliary
│ │ │ ├── original_metadata
│ │ │ ├── hyperspy_metadata
│ │ │ ├── learning_results
│ │ ├── signal_data
│ │ │ ├── data and axes (NXdata format)
To save multiple signals, the ``file_writer`` can be called directly passing a
list of signals.
.. code-block:: python
>>> from rsciio.nexus import file_writer
>>> file_writer("test.nxs",[signal1,signal2])
When saving multiple signals, a default signal can be defined. This can be used when storing
associated data or processing steps along with a final result. All signals can be saved but
a single signal can be marked as the default for easier loading using RosettaSciIO
or plotting with NeXus tools.
The default signal is selected as the first signal in the list:
.. code-block:: python
>>> from rsciio.nexus import file_writer
>>> file_writer("test.nxs", [signal1, signal2], use_default = True)
The output will be arranged by signal name:
::
├── entry1 (NXentry)
│ ├── signal_name (NXentry)
│ │ ├── auxiliary (NXentry)
│ │ │ ├── original_metadata (NXcollection)
│ │ │ ├── hyperspy_metadata (NXcollection)
│ │ │ ├── learning_results (NXcollection)
│ │ ├── signal_data (NXdata format)
│ │ │ ├── data and axes
├── entry2 (NXentry)
│ ├── signal_name (NXentry)
│ │ ├── auxiliary (NXentry)
│ │ │ ├── original_metadata (NXcollection)
│ │ │ ├── hyperspy_metadata (NXcollection)
│ │ │ ├── learning_results (NXcollection)
│ │ ├── signal_data (NXdata)
│ │ │ ├── data and axes
.. note::
Signals saved as ``.nxs`` by this plugin can be loaded normally in HyperSpy
and the original_metadata, signal data, axes, metadata and learning_results
will be restored. Model information is not currently stored.
NeXus does not store how the data should be displayed.
To preserve the signal details an additional navigation attribute
is added to each axis to indicate if it is a navigation axis.
|