“A nearly impenetrable thicket of geekitude…”


I learned the difference between haphazard and random a long time ago, on a university statistics course. Since then, I’ve been wary of inventing passwords by just “thinking random” or using an obfuscation algorithm on something memorable (“replace Es by 3s, replace Ls by 7s”, or whatever). The concern is that there is really no way to know how much entropy there is in such a token (in the information theoretic sense), and it is probably less than you might think. People tend to guess high when asked how much entropy there is in something; most are surprised to hear that English text is down around one bit per letter, depending on the context.

If you know how much information entropy there is in your password, you have a good idea of how much work it would take for an attacker to guess your password by brute force: N bits of entropy means they have to try 2^N possibilities. One way to do this that I’ve used for several years is to take a fixed amount of real randomness and express it in hexadecimal. For example, I might say this to get a password with 32 bits (4 bytes) of entropy:

$ dd if=/dev/random bs=1 count=4 | od -t x1
0000000    14  37  a8  37

A password like 1437a837 is probably at the edge of memorability for most people, but I know that it has 32 bits worth of strength to it. So, what is one to do if there is a need for a stronger password, say one containing 64 bits of entropy? Certainly d4850aca371ce23c isn’t the answer for most of us.

When I was faced with a need to generate a higher entropy — but memorable — password recently, I remembered a technique used by some of the one-time password systems and described in RFC 2289. This uses a dictionary of 2048 (2^11) short English words to represent fragments of a 64-bit random number; six such words suffice to represent the whole 64-bit string with two bits left over for a checksum. In this scheme, our unmemorable d4850aca371ce23c becomes:


I couldn’t find any code that allowed me to go from the hexadecimal representation of a random bit string to something based on RFC 2289, so I wrote one myself. You can download SixWord.java if you’d like to see what I ended up with or need something like this yourself.

The code is dominated by an array holding the RFC 2289 dictionary of 2048 short words, and another array holding the 27 test vectors given in the RFC. When run, the program runs the test vectors then prompts for a hex string. You can use spaces in the input if you’re pasting something you got out of od, for example. The result should be a six word phrase you might have a chance of remembering. But if you put 64 bits worth of randomness in, you know that phrase will still have the same strength as a password as the hex gibberish did.