1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
|
<html>
<body>
<h1> libminuid design </h1>
<p>
libminuid generates <b>non-standard</b> UUIDs (GUIDs). This document explains
why it goes for a non-standard model and explains some of the design decisions.
<h2> Prior art <h2>
<p>
Standard <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">
UUIDs</a> are widespread. There are accessible implementations. However
there are a few drawbacks of them:
<ul>
<li> complexity: there are different versions; rfc4122 even describes a
version field within the UUID.
<li> privacy concerns (e.g. saving MAC addresses with some versions)
<li> some parts of the design does not really guarantee unique bits; for example
if multiple chroots or VMs ran on a server sharing the same physical
network interface, they could all have the same MAC address.
<li> portability issues: acquiring the MAC address or high precision time
stamp is typically easy on systems where acquiring well seeded
random numbers is also easy (e.g. modern Linux or BSD)
</ul>
<h2> Design decisions </h2>
<h3> Session prefix: random salt </h3>
<p>
By using a random salt as the first 14 bytes of the UUID, each UUID has 112 bits
that should make it unique across different threads/processes. This is somewhat
lower, but comparable to the standard UUID version 4's 122 bits.
<p>
Using random salt evades the main issues with standard UUIDs:
<ul>
<li> no MAC address, time stamps or other privacy-sensitive data leaks
<li> less portability issues: the only one thing the implementation is
required to do and is not ANSI-C89 standard is getting 112 bits of
random number from some source that preferably got hardware random
elements
<li> running in parallel on the same multi-processor hardware does not
increase the chance of generating the same salt
</ul>
<p>
As a fallback mechanism, on systems not offering /dev/urandom or /dev/random,
the host application may add to the salt by using whatever unique identifiers
or random sources available.
<h3> Suffix: 32 bit unsigned counter </h3>
<p>
For the purpose of having unique identifiers, a long-enough, properly obtained
random number should be enough. However generating such random number may be
expensive and in some situations a lot of UUIDs need to be generated at once.
<p>
To solve this, libminuid's UUID consists of the long random number session
prefix, generated only once per session and a plain integer counter suffix
that is simply incremented on each new UUID generation. In the unlikely
situation the 32 bit counter overflows, the random prefix should be altered
or a new random prefix should be generated.
<p>
Typically one process would set up one session, generating one random number.
If the application deals with multiple independent contexts, it's also reasonable
to create one session per context.
<p>
Multi-threaded applications should either initialize one session per thread
(that generate UUIDs) or should implement a thin wrapper around libminuid
API with locking.
<h3> Privacy implications </h3>
<p>
Because of the session seed and counter, looking at UUIDs there are still
three things that can be figured:
<ul>
<li> if two UUIDs have been generated within the same session
<li> if so, which one got generated earlier
<li> and at most how many other UUIDs have been generated in between
</ul>
<p>
Application designers should keep these in mind and where relevant, evade
these by using multiple sessions.
<h3> String format: base64 </h3>
<p>
Standard UUID uses hexadecimal printout and is written in text form as
32 characters + 4 hyphens, which is total 36 characters (for 128 bits
of binary UUID).
<p>
Libminuid uses base64 with no separators between prefix and suffix. It takes
24 characters in plain text to encode the 144 bits long binary UUID.
<p>
Thus libminuid produces more compact text output with two drawbacks:
<ul>
<li> it's harder to read out loud (because of no split-up and case sensitive encoding)
<li> it uses more ASCII characters, which may cause problems when encapsulating in a file formats or network protocol
</ul>
<p>
Widely used file formats and protocols typically can transmit base64 encoded
content or arbitrary text strings so the second drawback shouldn't cause
problems in practice.
|