1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
<TITLE>Revising Numeric Overflows in HDF5</TITLE>
<META http-equiv="content-type" content="text/html; charset=ISO-8859-1">
<META name="author" content="Quincey Koziol">
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF">
<STYLE TYPE="text/css">
OL.loweralpha { list-style-type: lower-alpha }
OL.upperroman { list-style-type: upper-roman }
</STYLE>
<CENTER><H1>Revising Numeric Overflows in HDF5</H1></CENTER>
<CENTER><H3>Quincey Koziol<br>
koziol@ncsa.uiuc.edu<BR>
April 23, 2004
</H3></CENTER>
<OL class="upperroman">
<LI><H3><U>Document's Audience:</U></H3>
<UL>
<LI>Current H5 library designers and knowledgable external developers.</li>
</UL>
<LI><H3><U>Background Reading:</U></H3>
<DL>
<dt>The HDF5 reference manual sections for
<code>H5Pset_type_conv_cb</code></a> and
<code>H5Pget_type_conv_cb</code></a>:
<dd><a href="../../RM/RM_H5P.html#Property-SetTypeConvCb"><code>H5Pset_type_conv_cb</code></a>
<dd><a href="../../RM/RM_H5P.html#Property-GetTypeConvCb"><code>H5Pget_type_conv_cb</code></a>
<dd><font size="-1">
(<code>H5Tset_overflow</code> and <code>H5Tget_overflow</code>
have been removed from the HDF5 Library.
This note inserted: January 2010.)
</font>
<DT>This section of the HDF5 User's Guide briefly mentions conversion overflows:
<dd><a href="../../UG/11_Datatypes.html#Dtransfer">Datatype Conversion
and Selection</A>
<DT>The netCDF user guide section on type conversion:
<dd><a href="http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html#Type-Conversion">Type Conversion</a>
<DT>The netCDF user guide section on fill values:
<dd><a href="http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c.html#nc_005fset_005ffill">Set Fill Mode for Writes: nc_set_fill</a>
</DL>
<LI><H3><U>Introduction:</U></H3>
<DL>
<DT><STRONG>What is this document about?</STRONG></DT>
<DD>This document describes some limitations of the current HDF5 datatype
conversion overflow behavior, new requirements for the overflow
behavior and describes a solution for covering those requirements.
</DD> <BR>
<DT><STRONG>How does the overflow callback currently operate?</STRONG></DT>
<DD>
<P>The HDF5 library's current behavior when converting a value from one
datatype to another is to issue a call to the "overflow" callback whenever
a source value is outside of the range of values representable by the
destination datatype. If the overflow callback is not set or is set,
but indicates that it hasn't handled the overflow (by returning a
negative value when called), the destination
value is set to either the maximum or minimum destination value (depending if
the source value was beyond the maximum or minimum destination value
respectively). The development branch of the HDF5 library (v1.7.x) extends
this behavior to it's support for int <-> float type conversions.
</P>
<P>
For example, when converting from a signed 16-bit
integer to an unsigned 16-bit integer, negative source values would trigger
a call to the overflow callback, and if it didn't exist or handle the overflow,
the minimum unsigned 16-bit integer (0) would be used as the destination value.
Likewise, when converting from a signed 32-bit integer to a signed 16-bit
integer, source values greater than 32767 or less than -32768 would trigger
a call to the overflow callback and would be set to 32767 or -32768
(respectively) if the overflow callback didn't exist or didn't handle the
exception.
</P>
</DD> <BR>
<DT><STRONG>What is the prototype for the current conversion callback?</STRONG></DT>
<DD>
<P>Here's the prototype for the overflow callback routine:
<pre>herr_t (*H5T_overflow_t)(hid_t src_id, hid_t dst_id, void *src_buf, void *dst_buf);</pre>
</P>
</DD> <BR>
</DL>
<LI><H3><U>New Requirements:</U></H3>
<P>There are several new requirements listed here, in order of decreasing
importance:
<UL>
<LI>Support netCDF-3 type conversion behavior. It's not obvious from
the background information about the netCDF-3 above, but netCDF-3
replaces "out of bound" values with the fill value for the dataset,
in addition to issuing an error. We will need to be able to pass
the appropriate fill-value for a datatype to the overflow callback
in order to support this use case.
<LI>Indicate type of "overflow". Now that we are performing
int <-> float type conversions, there are different kinds
of conversion exceptions that are possible. There are "range"
errors (as we've previously had) and now there can be "loss of
precision" errors for int -> float conversions and "truncation"
errors for float -> int conversions. The type of the conversion
exception should be passed to the overflow routine. This is
particularly important for netCDF-3 emulation, which only replaces
the destination value with the fill value for "range" errors, not
for the other types of errors.
<LI>Allow for "user data" to reach the callback. The general rule in
HDF5 callbacks is to allow a "void *" which the application sets
to be passed in to a callback routine. This would allow
applications to pass along any custom information that the callback
would need. The fill value mentioned above can be passed into the
callback through this "void *".
<LI>Make the callback a non-global setting. Currently there is one
global "overflow" callback for the HDF5 library. It should be
possible to set a callback on a per-data-transfer basis.
<LI>Allow for the callback to indicate the conversion should stop.
Currently the conversion of data values continues through the
entire buffer of values to convert. It would be nice if there were
a way for the callback routine to indicate that further values
should not be converted.
</UL>
</P>
<LI><H3><U>Suggested Changes:</U></H3>
<P>To accomodate the requirements above, changing the prototype of the
"overflow" callback is necessary, as well as attaching it to a data transfer
property list and changing the library's behavior for the return values of
the callback. I would also suggest changing the name from an "overflow"
callback to the more generic "type conversion exception callback".
</P>
<P>After accomodating the changes above, the prototype would look like this:
<pre>H5T_conv_ret_t (*H5T_conv_except_func_t)(H5T_conv_except_t except_type,
hid_t src_id, hid_t dst_id, void *src_buf, void *dst_buf, void *user_data);
</pre>
</P>
<P>The parameters are as follows:
<UL>
<LI><CODE>exception_type</CODE> - The type of exception (enum described
below).
<LI><CODE>src_id</CODE> - A datatype ID describing the source datatype.
<LI><CODE>dst_id</CODE> - A datatype ID describing the destination datatype.
<LI><CODE>src_buf</CODE> - A pointer to the buffer containing the source value.
<LI><CODE>dst_buf</CODE> - A pointer to the buffer where the destination value should
be placed.
<LI><CODE>user_data</CODE> - A pointer to the "user data" set by the application that
registered the conversion callback.
</UL>
</P>
<P>The <CODE>H5T_conv_except_t</CODE> enumerated type has the following values:
<UL>
<LI><CODE>H5T_CONV_EXCEPT_RANGE_HI</CODE> - The source value is greater
than the range of values that can be encoded in the destination value.
<LI><CODE>H5T_CONV_EXCEPT_RANGE_LOW</CODE> - The source value is less than
the range of values that can be encoded in the destination value.
<LI><CODE>H5T_CONV_EXCEPT_PRECISION</CODE> - The source value will suffer a
loss of precision when it is represented in the destination type.
(This generally occurs when very large integers are converted to
floating-point types)
<LI><CODE>H5T_CONV_EXCEPT_TRUNCATE</CODE> - The source value will suffer a
truncation in value when it is represented in the destination type.
(This generally occurs when floating-point values with fractional
values are converted to integer datatypes)
</UL>
</P>
<P>The return value from conversion callbacks is <CODE>H5T_conv_ret_t</CODE>,
a new enumerated type with the following values:
<UL>
<LI><CODE>H5T_CONV_ABORT</CODE> - Abort type conversion immediately (-1).
<LI><CODE>H5T_CONV_UNHANDLED</CODE> - Conversion callback did not
handled the conversion exception and the HDF5 library should perform
whatever default conversion it normally applies (0).
<LI><CODE>H5T_CONV_HANDLED</CODE> - Conversion callback
handled the conversion exception and the HDF5 library should
<EM>not</EM> perform whatever default conversion it normally applies (1).
</UL>
</P>
<P>Additionally, the <code>H5Tset_overflow</code> and
<code>H5Tget_overflow</code> routines would be retired in favor of
the following routines:
<HR>
<UL> <DL>
<DT><STRONG>Name:</STRONG></DT>
<DD>H5Pset_type_conv_cb
</DD>
<DT><STRONG>Purpose:</STRONG></DT>
<DD>Register a type conversion callback.
</DD>
<DT><STRONG>Signature:</STRONG></DT>
<DD>herr_t H5Pset_type_conv_cb(hid_t <EM>dxpl_id</EM>,
H5T_conv_except_func_t *<EM>op</EM>,
void *<EM>operator_data</EM>)
</DD>
<DT><STRONG>Parameters:</STRONG></DT>
<DD>
<DL>
<DT>hid_t <EM>dxpl_id</EM></DT>
<DD>IN: ID of dataset transfer property list.
</DD>
<DT>H5T_conv_except_func_t *<EM>op</EM></DT>
<DD>IN: Pointer to callback function to call for when a
type conversion exception occurs.
</DD>
<DT>void *<EM>operator_data</EM></DT>
<DD>IN: Pointer that will be passed to callback function when a
type conversion exception occurs.
</DD>
</DL>
</DD>
<DT><STRONG>Return Value:</STRONG></DT>
<DD>Returns non-negative on success, negative on failure.
</DD>
<DT><STRONG>Description:</STRONG></DT>
<DD><P>This function sets the callback function that the HDF5 library
will call when a type conversion exception occurs.
Type conversion exceptions occur under the following
circumstances:
<UL>
<LI>A source value is outside the range of values able to
be represented with the destination value (a range error).
<LI>A source value would lose precision when converted to a
value in the destination type (a loss of precision error).
<LI>A source value would have it's value truncated when
converted to a value in the destination type (a truncation
error).
</UL>
</P>
<P>The conversion exception callback (<CODE>H5T_conv_except_func_t</CODE>)
has the following prototype: (QAK: Fill in description from above
when creating final reference manual page)
</P>
</DD>
</DL> </UL>
<HR>
<UL> <DL>
<DT><STRONG>Name:</STRONG></DT>
<DD>H5Pget_type_conv_cb
</DD>
<DT><STRONG>Purpose:</STRONG></DT>
<DD>Retrieve the type conversion callback.
</DD>
<DT><STRONG>Signature:</STRONG></DT>
<DD>herr_t H5Pget_type_conv_cb(hid_t <EM>dxpl_id</EM>,
H5T_conv_except_func_t **<EM>op</EM>,
void **<EM>operator_data</EM>)
</DD>
<DT><STRONG>Parameters:</STRONG></DT>
<DD>
<DL>
<DT>hid_t <EM>dxpl_id</EM></DT>
<DD>IN: ID of dataset transfer property list.
</DD>
<DT>H5T_conv_except_func_t **<EM>op</EM></DT>
<DD>OUT: Pointer to location to place pointer to type conversion
exception callback function.
</DD>
<DT>void **<EM>operator_data</EM></DT>
<DD>OUT: Pointer to location to place pointer that will be
passed to type conversion exception callback function when a
type conversion exception occurs.
</DD>
</DL>
</DD>
<DT><STRONG>Return Value:</STRONG></DT>
<DD>Returns non-negative on success, negative on failure.
</DD>
<DT><STRONG>Description:</STRONG></DT>
<DD><P>This function retrieves the callback function that the HDF5
library will call when a type conversion exception occurs.
</DD>
</DL> </UL>
<HR>
</P>
<LI><H3><U>Compatibility With Version 1.6:</U></H3>
<p>As we try to switch the API from H5Tset(get)_overflow to H5Pset(get)_type_conv_cb,
we may encounter backward compatibility issue. We have a few options here:
<DL>
<DT><STRONG>Option 1</STRONG></DT>
<DD>Simply discard the old API functions(H5Tset(get)_overflow). This is
an easy solution. Hopefully, there are not many major users having
these functions in their program. Those failing programs have to
migrate to the new design with functions H5Pset(get)_type_conv_cb.
</DD> <BR>
<DT><STRONG>Option 2</STRONG></DT>
<DD>Keep the old API functions for some time during the transition.
User programs do not have to adopt the new functions. In case someone
happens to set both of the old and new callback functions, we give the
new one green light to proceed.
</P>
</DD> <BR>
<DT><STRONG>Option 3</STRONG></DT>
<DD>Keep the names of the old API functions but return error messages
when they are executed. Just remind users there are new API functions
available.
</DD> <BR>
</DL>
This issue is like what we face whenever we try to change API functions. We
should do it according our API change policy if we have one in hand.
<LI><H3><U>Discussion:</U></H3>
<P>With the above changes to the library, the datatype conversion code will be
able to fulfill all the new requirements listed. Supporting the new type
conversion exception callback and detecting the new exception types (loss of
precision and truncation) will require some additional coding, but not a huge
amount. Adding support for detecting the new exception types will make the
int <-> float conversions slightly slower and more complex though.
</P>
<P>If fill value is needed when conversion exception happens, it can be passed
in through the user data(a void pointer) in the callback or can be made a global
variable. To find the fill value is user's responsibility. They can retrieve it
from the dataset transfer property list easily.
</P>
</li>
</ol>
<hr>
Last modified: 20 January 2012.
Links updated and
note added that (<code>H5Tset_overflow</code> and
<code>H5Tget_overflow</code> have been removed from HDF5.
</BODY>
</HTML>
|