File: wordlists.rst

package info (click to toggle)
diceware 1.0.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,252 kB
  • sloc: python: 1,513; makefile: 165; sh: 10
file content (332 lines) | stat: -rw-r--r-- 11,924 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
Wordlists
=========

The passphrases generated by `diceware` naturally depend on the set of
words used, the wordlists.

`diceware` comes with some wordlists out-of-the-box, that might be a
good choice for usual private use.

.. warning::
         We do *not* use the `diceware standard wordlist`_,
         but the `long EFF wordlist`_ (see below), because it is more secure
         and more comfortable to use.

Currently (v1.0) we provide the following lists:

- `ca` (8192/2^13 words)

  A list of Catalan words. Compiled by `@jawlenskys`_ from Debian dict file for
  Catalan and a selection of most used Catalan Wikipedia words. This list
  provides the `prefix property`_.

- `de` (7776/6^5 words)

  A list of German words, suitable for use with dice. Generated with
  `diceware-list` based on wordlists from `Institut für Deutsche Sprache`_,
  Mannheim and filtering blacklists. This list provides the `prefix property`_.

- `de_8k` (8192/2^13 words)

  A longer list of German words, suitable for use with machines, nerds, and
  other binary-geared entities. Generated with `diceware-list` based on
  wordlists from `Institut für Deutsche Sprache`_, Mannheim and filtering
  blacklists. This list provides the `prefix property`_.

- `en_eff` (7776/6^5 words, default)

  This is the `long EFF wordlist`_ as published by the `Electronic Frontier
  Foundation`_ in mid-2016 and used by default. They put real `scientific
  effort`_ into the creation of this list which might considerably ease the
  use of passphrases generated with it. When using real dice (or other
  six-based randomness generators) use is definitely recommended!

  It was the first list in `diceware` that provided the
  `prefix property`_. That means it contains no word which is a prefix
  of another word. Lists without this property might provide a slightly
  decreased entropy.

- `en_securedrop` (8192/2^13 words)

  We provide a hand-crafted `en_securedrop` wordlist provided
  by `@Heartsucker`_. It contains 8,192 english words and
  phrases. This list is based on the `diceware standard wordlist`_ and
  extended to offer better memorizable words. Please see
  https://github.com/heartsucker/diceware for details. The name
  `en_securedrop` refers to the `securedrop`_ project.

- `en_adjectives` (1296/6^4 words)

  A list of english adjectives. This list is relatively short and should be
  used together with other lists -- for instance the `en_nouns` list -- to
  provide a sufficient security level. List provided from the
  `NaturalLanguagePasswords`_ project. This list got lots of short terms (good
  for comfort, bad for security) and does *not* provide the `prefix property`_.

- `en_nouns` (7776/6^5 words)

  A list of english nouns. Can be used together with other lists -- for
  instance the `en_adjectives` list to form natural language phrases. List
  provided from the `NaturalLanguagePasswords`_ project. This list got lots of
  short terms (good for comfort, bad for security) and does *not* provide the
  `prefix property`_.

- `es` (8192/2^13 words)

  A list of Spanish words, carefully crafted by `@jawlenskys`_ from Debian dict
  file for Spanish and a selection of most used Spanish words from `Corpus de
  Referencia del Español Actual (CREA)`_. This list provides the `prefix
  property`_ and is.

- `fr` (7776/6^5 words)

  A list of french words, compiled by @Tango for Tails OS and Tor. Handcrafted
  to avoid offensive and rare words. This list provides the `prefix property`_.

- `it` (8192/2^13 words)

  A list of Italian words, Compiled by `@jawlenskys`_ from Debian dict file for
  Italian and an `Italian frequency list
  <https://en.wiktionary.org/wiki/User:Matthias_Buchmeier#Italian_frequency_list>`
  generated from TV and movie subtitles.  This list provides the `prefix
  property`_.

- `pt-br` (7776/6^5 words)

  A list of Brazilian Portugese words, carefully crafted by `@drebs`_. This
  list contains no overshort words. It also provides the `prefix property`_.


You can pick wordlists to use with the ``-w`` or ``--wordlist`` option. Lists
with 7776 words are made for six-sided dice (7776 = 6^5) while lists with 8192
(2^13) words are made for machines and 2-sided coins.

You can also select several wordlists at once. In that case each "word" of the
generated passphrase consists of one word from each of the lists in the order
given.

Example::

   $ diceware -w en_adjectives en_nouns -n 2 -d '-'
   lax-toast-strong-reason

We get two "words" (`lax-toast` and `strong-reason`) each consisting of a
leading adjective and a trailing noun.
If you'd prefer the Yoda style, you could change that order::

   $ diceware -w en_nouns en_adjectives -n 2 -d '-'
   grains-honest-oxidant-happy

Each such term (like `oxidant-happy`) provides an entropy of about 23 bits.


Retired Wordlists
-----------------

Some wordlists have been removed from `diceware`, because they contained bad
language and words, users might be uncomfortable with.

- `en` (8192 words, removed in v0.10)

  The so-called `8k wordlist`_ from Mr. Reinhold as published on
  http://diceware.com/. It was something like the canonical wordlist for use
  with binary-geared entities like computers or nerds.

- `en_orig` (7776 words, removed in v0.10)

  This is the `diceware standard wordlist`_ as provided by
  Mr. Reinhold. Something like the canonical list in former times.
  There are now considerable alternatives.

None of these lists provide the `prefix property`_. They also provide overshort
terms, i.e. words that are so short, that they can lead to passphrases that are
easier to break by checking all char combinations than to try all combinations
of words in the wordlist.


Using Custom Wordlists
----------------------

You can use any wordlist you like. Simply give the filename and it
will be used::

  $ diceware mywordlist.txt
  HiHelloHelloHiHiHi

You can even pipe-in dynamic wordlists. Just use the dash ``-`` as
filename::

  $ mywordgenerator.sh | diceware -
  HiHiHelloHiHiHello

for instance.

Of course you have to give the filenames of your files with each call
to `diceware`.

But, if you want to store a wordlist persistently, you can do so too.

The built-in wordlists we offer for use with `diceware` are all stored in a
single directory. The exact location is output by ``--show-wordlist-dirs`` as
first entry::

  $ diceware --show-wordlist-dirs
  /path/to/some/directory
  /path/to/other/directory
  ...

But also all the other directories listed by this command are looked up for
wordlist files (if they exist).

You can put your own wordlists into one of these folders (here:
``/path/to/some/directory``, ``/path/to/other/directory``) and rename the file
to something like ``wordlist_MY_SPECIAL_NAME.txt``. Afterwards you can pick
your wordlist by running::

  $ diceware -w MY_SPECIAL_NAME

`diceware` will use this file of yours then to create a
passphrase. Please note that `diceware` only accepts files that are
named like::

  wordlist_NAME.txt

or::

  wordlist_OTHER_NAME.asc

I.e. we expect ``wordlist_`` at the beginning and some filename
extension like ``.txt`` at the end. Furthermore names must not contain
funny characters. In fact we accept regular letters, dashes, numbers,
and underscores only. Files that do not follow these naming convention
are ignored.

A list of all available wordlist names can be retrieved with ``--help``. See
the ``--wordlist`` explanation.


Where Wordlists are Looked Up
-----------------------------

Starting with version 1.0 wordlists can be stored in several directories.  We
look for wordlists in certain directories only.  The list of these directories
depends partly on environment variables. It can be shown with::

    $ diceware --show-wordlist-dirs
    /some/installdir/diceware/wordlists
    /home/user/.local/share/diceware
    /usr/local/share/diceware
    /usr/share/diceware

and may be different on your machine. Wordlist directories are looked up in the
order listed by ``--show-wordlist-dirs``. Wordlists in former directories
override same-named in latter ones.  So, with the order given above, a wordlist
named ``wordlist_foo.txt`` in ``/some/installdir/diceware/wordlists`` will have
precedence over a same-named wordfile located in ``/usr/share/diceware``.

The ``wordlists/`` directory of the Python package itself is always the first
we look into.

Afterwards we look up ``${XDG_DATA_HOME}/diceware/`` or, if this environment
variable is not set or empty, ``${HOME}/.local/share/diceware``.

At the end we look into each of the directories listed in the
colon-separated list in ``${XDG_DATA_DIRS}``, appended by ``/diceware``. So, if
``${XDG_DATA_DIRS}`` is set to ``/foo:/bar:/etc/foo``, we will look into
``/foo/diceware``, ``/bar/diceware`` and ``/etc/foo/diceware`` (in that order)
for wordlists.

In case the environment variable ``${XDG_DATA_DIRS}`` is not set or empty, we
look into ``/usr/local/share/diceware`` and ``/usr/share/diceware`` instead.

Under all circumstances we stop looking up wordlist directories, when the first
match (with a given wordlist name) happened.

All these rules try to follow the `XDG Base Directory Specification`_.


Plain Wordlists
---------------

Out of the box, `diceware` supports plain wordlists, PGP-signed
wordlists, and numbered wordlists. Plain wordlists look like this::

  termone
  termtwo
  anotherterm

Each line in such a file is considered a word of the wordlist. Empty
lines are ignored.

Whitespaces are allowed if they are not at the beginning or end of a
line, stripped off otherwise.


Numbered Wordlists
------------------

Numbered wordlists contain numbers in each line, telling a
sequence of dice rolls like so::

  11111    aterm
  11112    anotherterm
  ...

`diceware` detects such lines and in this case extracts ``aterm`` and
``anotherterm`` as wordlist entries.

Apart from simple digits written next to each other, `diceware` also
accepts numbers separated by dashes like this::

  1-1-1-1-1   aterm
  1-1-1-1-2   anotherterm

which is handy when working with wordlists for dice with more than 9
sides.


PGP-signed Wordlists
--------------------

PGP-signed wordlists are wordlists (ordinary or numbered ones), that
have been cryptographically signed with PGP or GPG. They look like
this::

  -----BEGIN PGP SIGNED MESSAGE-----
  Hash: SHA512

  foo
  bar
  baz

  -----BEGIN PGP SIGNATURE-----
  Version: GnuPG v1

  iJwEAQEKAAYFAlW00GEACgkQ+5ktCoLaPzSutwP8DVgdjBFqRXNKaZlvd8pR+P3k
  8xx5XLC0OFwZQFx4Ls8xl3+/xfvCNxCGSZjD6BGPzNZCK7bmQQYWcrsoEyX5jAC3
  dXjAPj0nct/PkJQlrUjUI2qrO0dFfU7sRj0Gn9TOlQQkKoQVwy7pY/6HaScGNepL
  J8BNUPYdOWeVgxY1jSY=
  =WXfu
  -----END PGP SIGNATURE-----

and are normally stored with the ``.asc`` filename extension. Signed
wordlists can be verified to detect changes, although this is not
automatically done by `diceware`.

.. warning:: Diceware does *not* automatically verify PGP-signed
             files.

.. _`8k wordlist`: http://world.std.com/~reinhold/diceware8k.txt
.. _`Corpus de Referencia del Español Actual (CREA)`: https://corpus.rae.es/lfrecuencias.html
.. _`diceware standard wordlist`: http://world.std.com/~reinhold/diceware.wordlist.asc
.. _`@drebs`: https://github.com/drebs
.. _`Electronic Frontier Foundation`: https://eff.org/
.. _`@Heartsucker`: https://github.com/heartsucker/
.. _`Institut für Deutsche Sprache`: https://www.ids-mannheim.de/derewo
.. _`@jawlenskys`: https://github.com/jawlenskys
.. _`long EFF wordlist`: https://www.eff.org/files/2016/07/18/eff_large_wordlist.txt
.. _`NaturalLanguagePasswords`: https://github.com/NaturalLanguagePasswords
.. _`prefix property`: https://en.wikipedia.org/wiki/Prefix_code
.. _`scientific effort`: https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases
.. _`securedrop`: https://github.com/freedomofpress/securedrop
.. _`XDG Base Directory Specification`: https://specifications.freedesktop.org/basedir-spec/latest/