Java

Ant fixcrlf and UTF-8 on Windows

I've been working on a large XML processing system in which a sequence of steps implemented in Java and other technologies are orchestrated using Apache Ant. It has to run on Mac OS, Linux and Windows. It has been pretty stable for some time, but I recently set up a new Windows system and started seeing errors like this:

Exception in thread "main" org.xml.sax.SAXParseException:
    Invalid byte 3 of 3-byte UTF-8 sequence.

RUSE MET LORD CURT REEL ION

I learned the difference between haphazard and random a long time ago, on a university statistics course. Since then, I've been wary of inventing passwords by just "thinking random" or using an obfuscation algorithm on something memorable ("replace Es by 3s, replace Ls by 7s", or whatever). The concern is that there is really no way to know how much entropy there is in such a token (in the information theoretic sense), and it is probably less than you might think. People tend to guess high when asked how much entropy there is in something; most are surprised to hear that English text is down around one bit per letter, depending on the context.

If you know how much information entropy there is in your password, you have a good idea of how much work it would take for an attacker to guess your password by brute force: N bits of entropy means they have to try 2^N possibilities. One way to do this that I've used for several years is to take a fixed amount of real randomness and express it in hexadecimal. For example, I might say this to get a password with 32 bits (4 bytes) of entropy:

$ dd if=/dev/random bs=1 count=4 | od -t x1
… 0000000 14 37 a8 37

A password like 1437a837 is probably at the edge of memorability for most people, but I know that it has 32 bits worth of strength to it. So, what is one to do if there is a need for a stronger password, say one containing 64 bits of entropy? Certainly d4850aca371ce23c isn't the answer for most of us.

When I was faced with a need to generate a higher entropy — but memorable — password recently, I remembered a technique used by some of the one-time password systems and described in RFC 2289. This uses a dictionary of 2048 (2^11) short English words to represent fragments of a 64-bit random number; six such words suffice to represent the whole 64-bit string with two bits left over for a checksum. In this scheme, our unmemorable d4850aca371ce23c becomes:

RUSE MET LORD CURT REEL ION

I couldn't find any code that allowed me to go from the hexadecimal representation of a random bit string to something based on RFC 2289, so I wrote one myself. You can download SixWord.java if you'd like to see what I ended up with or need something like this yourself.

The code is dominated by an array holding the RFC 2289 dictionary of 2048 short words, and another array holding the 27 test vectors given in the RFC. When run, the program runs the test vectors then prompts for a hex string. You can use spaces in the input if you're pasting something you got out of od, for example. The result should be a six word phrase you might have a chance of remembering. But if you put 64 bits worth of randomness in, you know that phrase will still have the same strength as a password as the hex gibberish did.

Programming Jakarta Struts, by Chuck Cavaness

A review of Programming Jakarta Struts, by Chuck Cavaness (O'Reilly), with some side comments on Struts itself.

Summary: covers the ground, but it's heavy going to begin with.

Tags:

Subscribe to RSS - Java