File: proposal.tex

package info (click to toggle)
hdf5-filter-plugin 0.0~git20221111.49e3b65-5
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 3,848 kB
  • sloc: ansic: 14,374; sh: 11,445; cpp: 1,463; makefile: 100; python: 19; xml: 6
file content (128 lines) | stat: -rw-r--r-- 7,096 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
\documentclass{scrartcl}
\usepackage{a4wide}
\usepackage{amsmath}
\usepackage{graphics}
\usepackage[disable]{todonotes}


\title{RDA Working Group Proposal:\\ HDF5 External Filter Plugin Working Group}

\begin{document}
\maketitle
\listoftodos

\section{Charter}

  One of the key features of HDF5 is the ability to apply compression (or "a compression filter" in the HDF5 terminology) to individual
data objects such as datasets and groups stored in an HDF5 file. Four general compression
algorithms are included in the HDF5 core library. 
With increasing amount of data recorded during experiments or generated by
computer simulations, compression algorithms  become increasingly important
in order to reduce the required storage space and to reduce a bandwidth when transferring files.
As there is no universal compression algorithm that satisfies the demands of
every application, HDF5 provides a capability to add a custom compression filter to address the needs of a specific application.
  
  However, this approach
comes with a serious drawback. In order for an application to read a file
containing data compressed with a custom filter, the source code of this
application must be changed to register the custom filter with the HDF5 library and
the application has to be recompiled. This approach becomes unacceptable for applications that are distributed as binary only, especially for the commercial applications such as MATLAB and IDL.

As a solution to this problem The HDF Group (THG) implemented a new external filter
interface for the HDF5 library that allows adding and removing filters at
runtime without recompiling the code. The filters are installed as shared
modules which are loaded by the library on demand. This new approach not only
allows commercial applications to access data compressed with a custom 
algorithm, it also makes life easier for the open source developers who do not
have to recompile their code whenever a new filter algorithm becomes available. 

One example of where the new external filter interface comes in handy is the
Eiger X-ray detector produced by DECTRIS using LZ4 compression. Another one
would be the two HDF5 Python bindings {\tt h5py} and {\tt PyTables} that provide
custom compression algorithms.
However, with the old approach data written by the Eiger detector or by one of
the two Python bindings using their own compression algorithms would never be
accessible by applications like Matlab or IDL.
Using the new external filter interface and after installing the appropriate
filter modules, commercial applications can easily access the data compressed with a custom filter just as if
it would have been written with one of the internal filters.

The proposed working group will 
\begin{itemize}
    \item Establish standards for how filter code should be organized, tested,
        documented and distributed. 
    \item Provide the infrastructure for developing and distributing  filters
        code, modules and documentation.
    \item Provide a set of standard files to be used as benchmarks for every
        filter so the users can verify their own filter module installations. 
    \item Be an entry point for users who encounter custom filters in files and
        require the appropriate filter module to access it. 
\end{itemize}
The working group will raise awareness among the developers of the requirement for the long-term accessibility to data compressed with custom filters in HDF5. 

\section{Value proposition}
The external filter interface implemented in recent releases of HDF5 has greatly
increased the capabilities to efficiently store data streams with extreme data
rates or data volumes in the HDF5 files.  However, further development of the custom filters by the Open Source community and assurance of the
long-term accessibility to data compressed with custom filters requires technical and
organizational framework. The framework should provide a standard for the source code development and distribution by creating the design and development guidelines and recommending the best
software development practices. This working group will establish the technological basis and
guidelines to render the external filter interface into a sustainable, well
documented facility. We expect that it will greatly facilitate the use and
deployment of the new compression filters for high-throughput storage of all kind of
binary data in HDF5 to the benefit of both academic and industrial research and
development.

\section{Working plan}

It is currently suggested that the source code for the external filters modules will go into a GitHub repository. The working group will
\begin{enumerate}

    \item Define the standards for the external filter module source code, in particular, for
        \begin{itemize}
            \item getting source code into the repository
            \item the source code structure the module has to obey 
            \item testing and documentation standards 
            \item test files
           
        \end{itemize}
     \item Determine the target platforms and architectures for which external
        filters should be available.  
    \item Develop a process for external filters registration with
        the repository (what are the prerequisites).
    \item Establish mailing lists for users and developers.
    \item Develop a formal process for filter maintenance (for instance, whether
        or not all filters shall be tested with every new release of the HDF5 library).
    \item Set up a process for dealing with abandoned modules in the
        repository. 
\end{enumerate}
The working group will identify other tasks as work progresses.

\section{Adoption plan}

Currently THG runs a website which lists all registered external 
filters and basic information about them. THG will stay in
charge of the external filters registration. The working group will collaborate with THG on making the registration process and publication of information
"developer" friendly.

Possible process for developing and publishing an extrenal filter could look like this:
\begin{enumerate}
    \item Propose a new filter and obtain a filter identifier from THG to
        start with the development.
    \item Access will be granted to the repository where the developer can deposit the
        code.
    \item Once the filter passes all required tests, the code must be reviewed.
    \item If the code passes the review process or after the changes requested
        by the reviewer have been made the filter can be published in the repository.
\end{enumerate}
This is a draft of a possible publishing process. The details
and all the subtle problems within it will be resolved as the working group
proceeds. 

\section{Initial Membership}
It is expected that a developer (or an organization), who has registered an external filter
with THG, will become a member of this working group. THG will also be a member of this group.



\end{document}