File: ALGORITHM

package info (click to toggle)
an 1.2-7
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, forky, sid, trixie
  • size: 188 kB
  • sloc: ansic: 624; makefile: 51; sh: 3
file content (30 lines) | stat: -rw-r--r-- 1,392 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
This is a completely from-scratch rewrite of the "an" anagram
generator.

It is a clean room reimplementation inspired by the algorithm by
Richard Jones and Julian Assange.  It fully supports accented words
(to the extent that libicu allows).

Algorithm:

* Given a phrase, convert it to a standard form (fold case and remove
  any accents and marks), generate the set of letters that are unique
  to that phrase (an alphabet), and generate a bit pattern describing
  the letter frequencies.

* Read the dictionary into a word list, converting the words into the
  standard form, and discarding any words which are longer than the
  phrase or don't contain letters which can be made from the
  phrase. Produce letter frequency bitpatterns for each word.

* Maintain a current bit pattern, which starts as the phrase's bit pattern.

* Step through the word list, finding words that have a bit pattern
  which is a subset of the current bit pattern. For each word that
  matches, push the current word onto a stack, then recurse into the
  "step through the word list" starting at the current we are looking
  at, and with a new current bit pattern that has the letter
  frequencies of the current word subtracted. If we've been able to
  exactly match the phrase length, print out the word stack. Pop the
  current word off the stack, and step on to the next word until we
  run out of words.