“A nearly impenetrable thicket of geekitude…”

Blog Comment Spam

Posted on October 3, 2003 at 16:01

New medium, same old sleaze it seems. Today, someone wishing to advertise Those Blue Pills placed a comment on each of the fifteen posts I’d made here so far. Just to make sure the message got through, some posts got up to three copies of the advertisement.

This was annoying, but I should have been expecting it.

With 50% of all e-mail being spam and with Windows users unprotected by firewalls seeing pop-up messenger spam without even having to open their e-mail client or browser, it would make sense to approach any new system by thinking first about ways in which it could be abused. In this case, there is a way for outsiders to add content to the site in the form of comment, so of course that will be used to put up content I don’t want.

The software running this blog is the very fine Movable Type. At present, the only tool it provides to help out with this kind of thing is to allow you to ban the IP address of someone you don’t want comments from. So, that was the first thing I did, rapidly followed by a manual and tedious deletion of the messages concerned.

I then did a trace on the IP address used, finding to my surprise that it actually belongs to a real machine (rather than a dialup account) belonging to a real company that doesn’t seem to be obviously evil. The supposition then has to be that their machine has been compromised; I’ll contact them and see how they respond.

Looking at the access logs for the site, it looks like the robot in question picks up the RSS feed for the site plus the main index and monthly archive indices. Following that, it fetches each page before submitting its comment. This means that the robot is sufficiently bright that renaming the comment submission script probably wouldn’t help, although that has been suggested (see tip 1) as a partial solution in some places. I may still do that if I get hit again and it looks like other robots are still stupid enough to be fooled by this, even though I don’t believe that security by obscurity is the right way to go in the long term.

For now, the only action I plan on taking other than blocking the particular IP address used in this attack is to make it easier to delete posts, either as suggested by Yoz Graham (see tip 6) or by Jakob Skjerning. I second Yoz’s opinion that something like this deserves to be part of Movable Type.

In the longer term, this isn’t going to fix the problem by itself. For a blog like mine that isn’t expecting a lot of comments, newsgroup-like moderation or some kind of CAPTCHA would be ideal, although the latter has accessability implications that I don’t like so much.

To close, just another observation from the logs: although there were multiple requests to the site from the attacking machine, the HTTP user-agent reported for each page read/comment post pair was different. This is presumably an attempt by the author of the robot software to avoid being filtered out on the basis of the user-agent string used. How very… enterprising of them.

Re Captcha - the technique doesn’t have to rely on vision - you could have an alternative link which played the keyword out as a sound file after an equivalent audio distortion had been applied. We’ve been looking at that over here in the US as an A.D.A. issue.

— Graham Toal on December 23, 2003