1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426
|
Bug Reports & Contributions
===========================
Contributions and bug reports are welcome from anyone! Some of the best
features in h5py, including thread support, dimension scales, and the
scale-offset filter, came from user code contributions.
Since we use GitHub, the workflow will be familiar to many people.
If you have questions about the process or about the details of implementing
your feature, feel free to ask on Github itself, or on the h5py section of the
HDF5 forum:
https://forum.hdfgroup.org/c/hdf5/h5py
Posting on this forum requires registering for a free account with HDF group.
Anyone can post to this list. Your first message will be approved by a
moderator, so don't worry if there's a brief delay.
This guide is divided into three sections. The first describes how to file
a bug report.
The second describes the mechanics of
how to submit a contribution to the h5py project; for example, how to
create a pull request, which branch to base your work on, etc.
We assume you're are familiar with Git, the version control system used by h5py.
If not, `here's a great place to start <https://git-scm.com/book>`_.
Finally, we describe the various subsystems inside h5py, and give
technical guidance as to how to implement your changes.
How to File a Bug Report
------------------------
Bug reports are always welcome! The issue tracker is at:
https://github.com/h5py/h5py/issues
If you're unsure whether you've found a bug
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Always feel free to ask on the mailing list (h5py at Google Groups).
Discussions there are seen by lots of people and are archived by Google.
Even if the issue you're having turns out not to be a bug in the end, other
people can benefit from a record of the conversation.
By the way, nobody will get mad if you file a bug and it turns out to be
something else. That's just how software development goes.
What to include
~~~~~~~~~~~~~~~
When filing a bug, there are two things you should include. The first is
the output of ``h5py.version.info``::
>>> import h5py
>>> print(h5py.version.info)
The second is a detailed explanation of what went wrong. Unless the bug
is really trivial, **include code if you can**, either via GitHub's
inline markup::
```
import h5py
h5py.explode() # Destroyed my computer!
```
or by uploading a code sample to `Github Gist <http://gist.github.com>`_.
How to Get Your Code into h5py
------------------------------
This section describes how to contribute changes to the h5py code base.
Before you start, be sure to read the h5py license and contributor
agreement in "license.txt". You can find this in the source distribution,
or view it online at the main h5py repository at GitHub.
The basic workflow is to clone h5py with git, make your changes in a topic
branch, and then create a pull request at GitHub asking to merge the changes
into the main h5py project.
Here are some tips to getting your pull requests accepted:
1. Let people know you're working on something. This could mean posting a
comment in an open issue, or sending an email to the mailing list. There's
nothing wrong with just opening a pull request, but it might save you time
if you ask for advice first.
2. Keep your changes focused. If you're fixing multiple issues, file multiple
pull requests. Try to keep the amount of reformatting clutter small so
the maintainers can easily see what you've changed in a diff.
3. Unit tests are mandatory for new features. This doesn't mean hundreds
(or even dozens) of tests! Just enough to make sure the feature works as
advertised. The maintainers will let you know if more are needed.
.. _git_checkout:
Clone the h5py repository
~~~~~~~~~~~~~~~~~~~~~~~~~
The best way to do this is by signing in to GitHub and cloning the
h5py project directly. You'll end up with a new repository under your
account; for example, if your username is ``yourname``, the repository
would be at http://github.com/yourname/h5py.
Then, clone your new copy of h5py to your local machine::
$ git clone http://github.com/yourname/h5py
Create a topic branch for your feature
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Check out a new branch for the bugfix or feature you're writing::
$ git checkout -b newfeature master
The exact name of the branch can be anything you want. For bug fixes, one
approach is to put the issue number in the branch name.
We develop all changes against the *master* branch.
If we're making a bugfix release, a bot will backport merged pull requests.
Implement the feature!
~~~~~~~~~~~~~~~~~~~~~~
You can implement the feature as a number of small changes, or as one big
commit; there's no project policy. Double-check to make sure you've
included all your files; run ``git status`` and check the output.
.. _contrib-run-tests:
Run the tests
~~~~~~~~~~~~~
The easiest way to run the tests is with
`tox <https://tox.readthedocs.io/en/latest/>`_::
pip install tox # Get tox
tox -e py312-test-deps # Run tests in one environment
tox # Run tests in all possible environments
tox -a # List defined environments
Write a release note
~~~~~~~~~~~~~~~~~~~~
Changes which could affect people building and using h5py after the next release
should have a news entry. You don't need to do this if your changes don't affect
usage, e.g. adding tests or correcting comments.
In the ``news/`` folder, make a copy of ``TEMPLATE.rst`` named after your branch.
Edit the new file, adding a sentence or two about what you've added or fixed.
Commit this to git too.
News entries are merged into the :doc:`what's new documents <whatsnew/index>`
for each release. They should allow someone to quickly understand what a new
feature is, or whether a bug they care about has been fixed. E.g.::
Bug fixes
---------
* Fix reading data for region references pointing to an empty selection.
The *Building h5py* section is for changes which affect how people build h5py
from source. It's not about how we make prebuilt wheels; changes to that which
make a visible difference can go in *New features* or *Bug fixes*.
Push your changes back and open a pull request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Push your topic branch back up to your GitHub clone::
$ git push origin newfeature
Then, `create a pull request <https://help.github.com/articles/creating-a-pull-request>`_ based on your topic branch.
Work with the maintainers
~~~~~~~~~~~~~~~~~~~~~~~~~
Your pull request might be accepted right away. More commonly, the maintainers
will post comments asking you to fix minor things, like add a few tests, clean
up the style to be PEP-8 compliant, etc.
The pull request page also shows the results of building and testing the
modified code on Travis and Appveyor CI and Azure Pipelines.
Check back after about 30 minutes to see if the build succeeded,
and if not, try to modify your changes to make it work.
When making changes after creating your pull request, just add commits to
your topic branch and push them to your GitHub repository. Don't try to
rebase or open a new pull request! We don't mind having a few extra
commits in the history, and it's helpful to keep all the history together
in one place.
How to Modify h5py
------------------
This section is a little more involved, and provides tips on how to modify
h5py. The h5py package is built in layers. Starting from the bottom, they
are:
1. The HDF5 C API (provided by libhdf5)
2. Auto-generated Cython wrappers for the C API (``api_gen.py``)
3. Low-level interface, written in Cython, using the wrappers from (2)
4. High-level interface, written in Python, with things like ``h5py.File``.
5. Unit test code
Rather than talk about the layers in an abstract way, the parts below are
guides to adding specific functionality to various parts of h5py.
Most sections span at least two or three of these layers.
Adding a function from the HDF5 C API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is one of the most common contributed changes. The example below shows
how one would add the function ``H5Dget_storage_size``,
which determines the space on disk used by an HDF5 dataset. This function
is already partially wrapped in h5py, so you can see how it works.
It's recommended that
you follow along, if not by actually adding the feature then by at least
opening the various files as we work through the example.
First, get ahold of
the function signature; the easiest place for this is at the `online
HDF5 Reference Manual <https://support.hdfgroup.org/documentation/hdf5/latest/_r_m.htmll>`_.
Then, add the function's C signature to the file ``api_functions.txt``::
hsize_t H5Dget_storage_size(hid_t dset_id)
This particular signature uses types (``hsize_t``, ``hid_t``) which are already
defined elsewhere. But if
the function you're adding needs a struct or enum definition, you can
add it using Cython code to the file ``api_types_hdf5.pxd``.
The next step is to add a Cython function or method which calls the function
you added. The h5py modules follow the naming convention
of the C API; functions starting with ``H5D`` are wrapped in ``h5d.pyx``.
Opening ``h5d.pyx``, we notice that since this function takes a dataset
identifier as the first argument, it belongs as a method on the DatasetID
object. We write a wrapper method::
def get_storage_size(self):
""" () => LONG storage_size
Determine the amount of file space required for a dataset. Note
this only counts the space which has actually been allocated; it
may even be zero.
"""
return H5Dget_storage_size(self.id)
The first line of the docstring gives the method signature.
This is necessary because Cython will use a "generic" signature like
``method(*args, **kwds)`` when the file is compiled. The h5py documentation
system will extract the first line and use it as the signature.
Next, we decide whether we want to add access to this function to the
high-level interface. That means users of the top-level ``h5py.Dataset``
object will be able to see how much space on disk their files use. The
high-level interface is implemented in the subpackage ``h5py._hl``, and
the Dataset object is in module ``dataset.py``. Opening it up, we add
a property on the ``Dataset`` object::
@property
def storagesize(self):
""" Size (in bytes) of this dataset on disk. """
return self.id.get_storage_size()
You'll see that the low-level ``DatasetID`` object is available on the
high-level ``Dataset`` object as ``obj.id``. This is true of all the
high-level objects, like ``File`` and ``Group`` as well.
Finally (and don't skip this step), we write **unit tests** for this feature.
Since the feature is ultimately exposed at the high-level interface, it's OK
to write tests for the ``Dataset.storagesize`` property only. Unit tests for
the high-level interface are located in the "tests" subfolder, right near
``dataset.py``.
It looks like the right file is ``test_dataset.py``. Unit tests are
implemented as methods on custom ``unittest.UnitTest`` subclasses;
each new feature should be tested by its own new class. In the
``test_dataset`` module, we see there's already a subclass called
``BaseDataset``, which implements some simple set-up and cleanup methods and
provides a ``h5py.File`` object as ``obj.f``. We'll base our test class on
that::
class TestStorageSize(BaseDataset):
"""
Feature: Dataset.storagesize indicates how much space is used.
"""
def test_empty(self):
""" Empty datasets take no space on disk """
dset = self.f.create_dataset("x", (100,100))
self.assertEqual(dset.storagesize, 0)
def test_data(self):
""" Storage size is correct for non-empty datasets """
dset = self.f.create_dataset("x", (100,), dtype='uint8')
dset[...] = 42
self.assertEqual(dset.storagesize, 100)
This set of tests would be adequate to get a pull request approved. We don't
test every combination under the sun (different ranks, datasets with more
than 2**32 elements, datasets with the string "kumquat" in the name...), but
the basic, commonly encountered set of conditions.
To build and test our changes, we have to do a few things. First of all,
run the file ``api_gen.py`` to re-generate the Cython wrappers from
``api_functions.txt``::
$ python api_gen.py
Then build the project, which recompiles ``h5d.pyx``::
$ python setup.py build
Finally, run the test suite, which includes the two methods we just wrote::
$ python setup.py test
If the tests pass, the feature is ready for a pull request.
Adding a function only available in certain versions of HDF5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
At the moment, h5py must be compatible with HDF5 back to version 1.10.4.
But it's possible to conditionally
include functions which only appear in newer versions of HDF5. It's also
possible to mark functions which require Parallel HDF5. For example, the
function ``H5Fset_mpi_atomicity`` was introduced in HDF5 1.8.9 and requires
Parallel HDF5. Specifiers before the signature in ``api_functions.txt``
communicate this::
MPI 1.8.9 herr_t H5Fset_mpi_atomicity(hid_t file_id, hbool_t flag)
You can specify either, both or none of "MPI" or a version number in "X.Y.Z"
format.
In the Cython code, these show up as ``tempita`` template comments.
So the low-level implementation (as a method on
``h5py.h5f.FileID``) looks like this::
### {{if MPI and HDF5_VERSION >= (1, 8, 9)}}
def set_mpi_atomicity(self, bint atomicity):
""" (BOOL atomicity)
For MPI-IO driver, set to atomic (True), which guarantees sequential
I/O semantics, or non-atomic (False), which improves performance.
Default is False.
Feature requires: 1.8.9 and Parallel HDF5
"""
H5Fset_mpi_atomicity(self.id, <hbool_t>atomicity)
### {{endif}}
High-level code can check the version of the HDF5 library, or check to see if
the method is present on ``FileID`` objects.
Testing MPI-only features/code
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Typically to run code under MPI, ``mpirun`` must be used to start the MPI
processes. Similarly, tests using MPI features (such as collective IO), must
also be run under ``mpirun``. h5py uses pytest markers (specifically
``pytest.mark.mpi`` and other markers from
`pytest-mpi <https://pytest-mpi.readthedocs.io>`_) to specify which tests
require usage of ``mpirun``, and will handle skipping the tests as needed. A
simple example of how to do this is::
@pytest.mark.mpi
def test_mpi_feature():
import mpi4py
# test the MPI feature
To run these tests, you'll need to:
1. Have ``tox`` installed (e.g. via ``pip install tox``)
2. Have HDF5 built with MPI as per :ref:`build_mpi`
Then running::
$ CC='mpicc' HDF5_MPI=ON tox -e py312-test-deps-mpi
should run the tests. You may need to pass ``HDF5_DIR`` depending on the
location of the HDF5 with MPI support. You can choose which python version to
build against by changing py37 (e.g. py36 runs python 3.6, this is a tox
feature), and test with the minimum version requirements by using ``mindeps``
rather than ``deps``.
If you get an error similar to::
There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
python
Either request fewer slots for your application, or make more slots available
for use.
then you need to reduce the number of MPI processes you are asking MPI to use.
If you have already reduced the number of processes requested (or are running
the default number which is 2), you will need to look up the documentation for
your MPI implementation for handling this error. On OpenMPI (which is usually
the default MPI implementation on most systems), running::
$ export OMPI_MCA_rmaps_base_oversubscribe=1
will instruct OpenMPI to allow more MPI processes than available cores on your
system.
If you need to pass additional environment variables to your MPI implementation,
add these variables to the ``passenv`` setting in the ``tox.ini``, and send us a PR
with that change noting the MPI implementation.
|