1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187
|
<html>
<title>
Checksumming and PGP signing HDF5 releases
</title>
<!--
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Copyright by The HDF Group. *
* Copyright by the Board of Trustees of the University of Illinois. *
* All rights reserved. *
* *
* This file is part of HDF5. The full HDF5 copyright notice, including *
* terms governing use, modification, and redistribution, is contained in *
* the files COPYING and Copyright.html. COPYING can be found at the root *
* of the source code distribution tree; Copyright.html can be found at the *
* root level of an installed copy of the electronic HDF5 document set and *
* is linked from the top-level documents page. It can also be found at *
* http://hdfgroup.org/HDF5/doc/Copyright.html. If you do not have *
* access to either file, you may request a copy from help@hdfgroup.org. *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
-->
<body>
<h3>Checksumming and PGP signing HDF5 releases</h3>
<p>James Laird - August 22, 2005</p>
<br>
<p><b>User Request</b> (bug #387)<br><br>
I was hoping I could get the release process modified
to include a md5file checksum and a PGP signature
of the release.</p>
This would go a long way to ensure that the distribution
that is downloaded is actually the software that was
released by the NCSA.</p>
<p>For example:</p>
<p>    hdf5.1.6.4.tar.gz - compressed tarfile<br>
    hdf5.1.6.4.tar.gz.md5 - md5 checksum of the compressed tarfile<br>
    hdf5.1.6.4.tar.gz.asc - PGP signature of the compressed tarfile</p>
<p>This is common model for ensuring the integrity of open source
Distribution that are downloaded from multiple mirrors.</p>
<p>Here a link to the Apache web site that is a good reference:<br>
<a href="http://httpd.apache.org/dev/verification.html">http://httpd.apache.org/dev/verification.html</a></p>
<p>Any chance that I can persuade the developers to add this feature
to the current release process?</p>
<p>Regards,<br>
    Pete Rothermel</p><br>
<p><b>What is an MD5 checksum?</b><br>
Explanation from the <a href="http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html">MD5 Homepage (unofficial)</a>:</p>
<p>MD5 was developed by
Professor Ronald L. Rivest of MIT. What it does, to quote the executive
summary of
rfc1321, is:
<blockquote>
[The MD5 algorithm] takes as input a message of arbitrary length and
produces
as output a 128-bit "fingerprint" or "message digest" of the input.
It is conjectured that it is computationally infeasible to produce
two messages having the same message digest, or to produce any
message having a given prespecified target message digest. The MD5
algorithm is intended for digital signature applications, where a
large file must be "compressed" in a secure manner before being
encrypted with a private (secret) key under a public-key cryptosystem
such as RSA.
</blockquote>
In essence, MD5 is a way to verify data integrity, and is much more reliable
than checksum and many other commonly used methods.</p>
<br>
<p>The "fingerprint" that MD5 produces is (usually) unique to a given message.
To use this in HDF5, we would run MD5 on the release tarball and post the
resulting file (hdf5.tar.md5) on our website along with the tarball itself.
If an attacker attempted to replace the real hdf5.tar archive with a fake,
the fake archive would have a different MD5 checksum and this could be detected
by the user. A user could also download the hdf5 tarball from a mirror
FTP site, then compare that tarball's MD5 checksum against the correct value
on the main HDF5 website.</p>
<p>MD5 is vulnerable to an attack in which a false checksum is provided for
the false archive. There must be a secure channel to transmit the correct
checksum. However, this checksum is very small (50 bytes), so is easier to
transmit than the full hdf5 tarball. It might be wise to post the MD5
checksum on the HDF5 website in cleartext as well as adding it to the FTP
site to make it more difficult for an attacker to replace it with a false
checksum.<br>
MD5 is known to be vulnerable to certain
collision attacks,
but these would not affect the HDF5 release process; given an existing
MD5 checksum, it is still extremely difficult to create another file with
the same checksum. </p>
<p><b>What is a PGP signature?</b><br>
For an in-depth discussion of PGP key signing, see the comp.security.pgp FAQ.</p>
<p>Briefly, a PGP signature allows a user to vouch for a message's
trustworthiness. The signature uses a hash function to ensure that the
message has not changed since it was signed (the same principle as MD5).
Furthermore, the signature confirms that the message has come from the user
who signed it.</p>
<p>PGP signatures rely on asymmetric keys. A signature is created with a
private key, while the public key is widely published. Only the
user with the private key could have created the signature and encoded the
result of the hash function, but anyone can decode the signature to check its
validity and that the message is unchanged. This would prevent an attacker
from substituting a fake hdf5.tar archive, since the fake archive would have
a different hash value--it would be clear that the signature corresponded to
a different file. In addition, the attacker could not create a fake
signature to accompany the fake archive (as they could with MD5), since they
do not have HDF5's private key. The signature would ensure that no-one could
masquerade as the HDF group.</p>
<p>PGP's main drawback is that a signature can only be trusted as far as it
is known to belong to its owner. If an attacker creates an entirely new
key and claims it belongs to HDF, users have no way of knowing which is the
correct key and which is the fake. The solution to this problem is to
be connected to the "web of trust," which involves having a key signed by
others who can vouch for the authenticity of the key. If HDF5 doesn't
expend some effort to enter this web of trust (and doesn't have any members
with well-known public keys), users may not be able to tell which
copy of HDF5 is genuine, but it will be easy to detect that an attack is
taking place, since there will be two different keys in existence!</p>
<p><b>Short-term and long-term costs</b><br>
First, the MD5 checksum and PGP signature would take a small amount of work
to create. Doing this by hand is easy--it took me about ten minutes to
create a PGP key, and the commands to create the checksum and signature are
one line each. If we wanted to create both a checksum and a signature for
every platform-specific tarball, these two lines would need to be run for
each tarball. Both of the tools needed (pgp and md5sum) are already installed
on heping.<br></p>
<p>The creation of an MD5 checksum can be automated to take place when the
bin/reconfigure script is run. PGP signing should <b>not</b> be automated,
any more than a pen-and-paper signature should be automated. It might be
easiest to have snapshots involve only an MD5 checksum, while full releases
use a PGP signature.</p>
<p>The resulting MD5 checksum and PGP signature files would have to be
distributed along with the main HDF5 tarball. This would probably involve
a small alteration to the website and would use a trivial amount of space.</p>
<p>In the long term, HDF5's PGP key would need to be maintained. It would
be most appropriate to use a single person's key (rather than a "group" key),
so this person would need to examine each release that was signed. Even if
we don't worry about the "web of trust" in the short term, users will probably
request that we have the key signed by other users eventually, so we may
need to have our keyholder attend "key-signing parties" or attend conferences
with his or her key and photo ID. We will not be forced to expend any more
effort on this than we want to, of course, and a key that has not been
signed by other users will still be more useful than no key at all.<br>
There should be few other long-term costs beyond the effort required to
create and keep track of the MD5 and PGP files. We will need to ensure that
we have working versions of md5sum and pgp (both of which are under the GNU
license) and keep track of our PGP keyring.</p>
<p>To verify MD5 checksums or PGP signatures, users would need the appropriate
utilities. md5sum and gpg are free for Linux, and utilities exist for Windows.
Both MD5 and PGP would create a separate files containing the checksum or
signature, users who wouldn't want to or couldn't worry about these files
could safely ignore them and simply untar the archive as usual.</p>
<br>
<p><b>To create a signature file:</b></p>
<pre>
Create a key:
gpg --gen-key (follow the instructions)
Create a separate sig file:
gpg -b --armor hdf5.tar
Which can then be verified with
gpg --verify hdf5.tar.asc
</pre>
<p><b>To create an MD5 checksum:</b></p>
<pre>
Find the checksum:
md5sum hdf5.tar > hdf5.tar.md5
Which can then be verified:
mdf5sum --check hdf5.tar.md5
Alternately or in addition, we can just post the output
from md5sum hdf5.tar on our website, and let users read it
and compare it themselves.
</pre>
<hr>
<address>
Last modified: 19 September 2005
</address>
</body>
</html>
|