Ant fixcrlf and UTF-8 on Windows

I've been working on a large XML processing system in which a sequence of steps implemented in Java and other technologies are orchestrated using Apache Ant. It has to run on Mac OS, Linux and Windows. It has been pretty stable for some time, but I recently set up a new Windows system and started seeing errors like this:

Exception in thread "main" org.xml.sax.SAXParseException:
    Invalid byte 3 of 3-byte UTF-8 sequence.


There are only a couple of weeks left until Google Reader shuts down. Like many other people (the "loyal but declining" following the product had certainly numbered in the millions) I've been looking at alternatives for a while now. I've finally settled on feedly.


OmniFocus 1.0

After a long public beta program, OmniFocus, OmniGroup's "professional-grade personal task management" application for the Mac, has finally reached its 1.0 milestone. If you're already both a Mac cultist and a Getting Things Done convert, you probably already know this because you're one of the 13,590 people who pre-ordered it.

GTD and OmniFocus won't magically rescue you from being disorganised (they certainly haven't entirely done that for me) but I've found that some of the GTD principles that OmniFocus allows you to implement really do lead to some level of stress reduction:

  • Get everything that's on your mind out of your head and into a trusted system.

  • Plan in terms of small, concrete, actionable steps.

  • Concentrate on the next available action for your current context.

You probably can't plan multi-person mega-projects this way, but that's not what this product is for. If you're trying to hold together a lot of smaller projects, it can be pretty much ideal. There's a 14-day trial available.


Thunderbird 2

The big green one was always my favourite, you never knew what was in the pod.

In this case, though, I come to sing the praises not of a hypersonic airborne truck but of the second major version of the Thunderbird e-mail client. For me, it has two new must-have features that fit in really well with the way I work:

  • You can have replies filed in the same folder as the original message.

  • There's a folder view that just shows folders with new messages.

I have a lot of message filters active that file incoming mail into per-topic folders. Knowing which I need to look at, and not having to re-file outgoing replies, will save me a lot of time.

The one "gotcha" so far has been that some of the key bindings have been changed, at least on the Mac. In particular Shift-Pretzel-M no longer creates a new message, but instead moves the currently selected message to the last folder you moved something to… hilarity ensues. After some initial cursing along the lines of "where on earth did that message disappear to", obviously.


Fusion Beta

I've been getting more and more dependent on virtual machine technology over the last couple of years. Although I use Microsoft's Virtual PC for Mac and the Parallels Desktop product from time to time, most of my virtual machines live on VMware Server under Linux or VMware Workstation running on my remaining Windows 2000 desktop machine. Today's announcement of a public beta for VMware's Fusion product for Intel-based Macs therefore came as a pleasant festive surprise. Of course, I downloaded it right away.

My first impression is that it seems to work just fine. The current beta version runs with a lot of debug code active, and the first thing you see is a warning that you're not going to get a lot of performance out of it. The Beta EULA prevents me from commenting further on that front, and in any case both VMware Server and Parallels Desktop were less than stellar in their beta phases so it isn't significant information.

Problems? So far, really very few. I brought up a Fedora Core 6 client from scratch in about half an hour, and downloaded an OpenFiler virtual appliance and had it up and running in seconds. For some reason, I couldn't connect to the OpenFiler appliance's administrative interface from the host machine, but it appeared to be working fine from another machine. [Update: known issue in this build, see comments below.]

The current beta is missing any kind of snapshot facility, which is a pity as that's one of the things that marks out VMware Workstation as such an excellent development tool. The other thing it's missing is some of the GUI for doing things like adding more virtual hard disks to a machine. However, I found that if I stored the virtual machine on a removable FAT32 drive, I could swap it over to the Windows machine running Workstation, make changes there then swap it back to the Mac again. Neat!

Summary: very good for an initial public beta. If the final functionality comes up to Workstation's level, particularly in the area of snapshots, I'm a customer.



I learned the difference between haphazard and random a long time ago, on a university statistics course. Since then, I've been wary of inventing passwords by just "thinking random" or using an obfuscation algorithm on something memorable ("replace Es by 3s, replace Ls by 7s", or whatever). The concern is that there is really no way to know how much entropy there is in such a token (in the information theoretic sense), and it is probably less than you might think. People tend to guess high when asked how much entropy there is in something; most are surprised to hear that English text is down around one bit per letter, depending on the context.

If you know how much information entropy there is in your password, you have a good idea of how much work it would take for an attacker to guess your password by brute force: N bits of entropy means they have to try 2^N possibilities. One way to do this that I've used for several years is to take a fixed amount of real randomness and express it in hexadecimal. For example, I might say this to get a password with 32 bits (4 bytes) of entropy:

$ dd if=/dev/random bs=1 count=4 | od -t x1
… 0000000 14 37 a8 37

A password like 1437a837 is probably at the edge of memorability for most people, but I know that it has 32 bits worth of strength to it. So, what is one to do if there is a need for a stronger password, say one containing 64 bits of entropy? Certainly d4850aca371ce23c isn't the answer for most of us.

When I was faced with a need to generate a higher entropy — but memorable — password recently, I remembered a technique used by some of the one-time password systems and described in RFC 2289. This uses a dictionary of 2048 (2^11) short English words to represent fragments of a 64-bit random number; six such words suffice to represent the whole 64-bit string with two bits left over for a checksum. In this scheme, our unmemorable d4850aca371ce23c becomes:


I couldn't find any code that allowed me to go from the hexadecimal representation of a random bit string to something based on RFC 2289, so I wrote one myself. You can download if you'd like to see what I ended up with or need something like this yourself.

The code is dominated by an array holding the RFC 2289 dictionary of 2048 short words, and another array holding the 27 test vectors given in the RFC. When run, the program runs the test vectors then prompts for a hex string. You can use spaces in the input if you're pasting something you got out of od, for example. The result should be a six word phrase you might have a chance of remembering. But if you put 64 bits worth of randomness in, you know that phrase will still have the same strength as a password as the hex gibberish did.

Hiring the Top 1%

Hiring good people is hard. If you advertise, you face wading through dozens or hundreds of CVs trying to figure out who the best people are. If you filter CVs, then filter again through interviews, you're probably inclined to think that you're being terribly selective, and that your final hires are among an elite. People often say "we hire the top 1%".

There are several fallacies there: most obviously, it is hard to pick the people to interview on the basis of a CV, and interviewing people is a pretty hit-and-miss affair. More crucially, and less well recognised, is that you only get to pick from the people who apply, not from the whole population. Joel Spolsky's recent article does a great job of explaining this in a really clear way.

One of Joel's conclusions, which matches at least one case in my own experience, is that working with summer students (US: interns) is a good idea. This is not for the usually stated reasons (they work for peanuts! they know all the latest research! they are really gullible about working conditions!) but simply because if someone turns out to be really good, a summer work placement might be almost the last chance anyone has to hire them before they fall off the hiring radar for good.


The Daily WTF

Subtitled Curious Perversions In Information Technology, The Daily WTF is a collection of found software artifacts that will make most experienced software people look once, do a double-take, then yell "WTF?". Hence the name.

I'm not sure whether to file this one under "Humour" or "Really, really, scarey."

[Thanks to Rod]


A friend just sent me a link to RFC 1925, The Twelve Networking Truths. Although it is one of the humorous April 1st RFCs (in this case by Ross Callon of the Internet Order of Old Farts), it is also full of genuine truths.

Although I don't follow the really popular bloggers to any great extent, I had come across Mark Pilgrim because of his columns at, and I knew he had a blog that was fairly popular. What I didn't know until a couple of days ago is that Dive Into Mark contains a number of things that are funny, but also full of genuine truths.

If you're drinking coffee just now, I suggest you put it down somewhere safe. Then read Mark's essay on why specs matter, which he starts off with the statement that most developers are morons, and the rest are assholes. True, true. If you've had a really bad day in the standards mines, find relief for your grief in the short but pointed Unicode Normalization Form C. Funny; but also so, so true.

The Javafication of PHP

I do a fair amount of programming in PHP, but I've never been an uncritical fan of the language. My initial impression of it was that PHP must be the secret love child of Kernighan & Ritchie era C and Perl 4, combining as it does a pre-C++ model of object oriented programming with dynamic typing, a general attitude of "if you write it, I'll find a way to make it mean something" and a library that only the kindest could regard as other than rambling and incoherent.



Subscribe to RSS - Software