Over the past few months the amount of comment SPAM I have been recieving for this website has been going through the roof. So far, and I say so far, they have not found the other blogs that I run.
99.99% of the SPAM is from a select bunch of websites responsible for either medical supplies, or poker. Texas’holdem or whatever it calls itself.
There is no point in complaining to the hosting company of those sites because they just ignore you. You might notice that there are no SPAM comments displayed on my site because I use a MT Plugin call SpamLookup. I recommend that you install this as the very first thing you do after installing MT. If you get a pre-installed version of MT, then make sure that whoever maintains your installation, installs SpamLookup or something as good as it. It will save you a lot of headache, bandwidth and/or database space later on when the ‘dark side’ find you, and they will eventually.
Despite the fact that no SPAM actually makes it on to my web pages, up until 3 days ago I was getting hit by various IP addresses and I was finding 10,000 log entries a day. This all eats bandwidth and fills up my database with the messages. It’s not that major, but it still gets right up my nose. And those 10,000 hits are despite me having a rather long .htaccess file. I last updated my .htaccess a week or so ago when I basically blocked the complete IP address space of Comcast users. This means that anyone on Comcast cannot access my webserver. I also added them to my email blacklist as well. For some reason Comcast are very deaf when it comes to listenening to complaints about abuse on their network. They also end up with a lot of ‘open relays’ sitting on their network which allows the normal email SPAM to come through. It’s a bit drastic I know, but hey, I don’t know many people who use Comcast so it is no skin off my nose. It makes life a lot quieter for me.
Well anyway, after I added Comcast to my .htaccess, I realised that it was taking too much time to update this file and also it was increasing in size. One of the problems with using .htaccess rather than the gloabl site config files is that the .htaccess file is parsed on each web page request. This puts an excessive load on the webserver itself and also slows down webpage access. It’s not a problem for me yet, but if I kept adding entries to the file, it would soon be. I needed another way to keep them from hitting my site.
When looking at my site access logs, you can see that by far the greatest number of entry page hits was for /mt/mt-comments.cgi
That means to say that they are hitting that page first and not getting it from a link on another page. This could be because the default install of MT uses MT-Comment.cgi as its default name. It does provide you with a method of changing the name of the comment script file incase you have to run it with .pl and not a .cgi extension.
Out of interest I changed the name of the file to something less obvious and changed the file name in the configuration file to match.
Amount of comment SPAM over the last 3 days? None.
Amount of 404 errors in my site access logs… 1000’s and 1000’s.
So it appears that they don’t scan your webpages too many times to see what the link actually is for your comment script file.
So here I would be as bold as to suggest that the second thing you do after installing MT on your site is to rename the file and change the setting in the MT.CFG configuration file.
To take this one step further, I can see a day when they make the process more inteligent at their end and actually scan your blog to find out what link the script file exists at. It would be no more difficult to implement than a similar script to harvest email addresses from your site. So one of the first things we may have to consider in the future is a method of obfuscating the name and link of the comment script file.
Perhaps one step further would be to dynamically change the name of the script file on a regular basis. Basically some process would:
i. Copy the script file.
ii. Rename it to some obscure file name.
iii. Make sure it has the correct privaledges.
iv. Update the my.cfg file to point to the new file.
v. Delete the old file.
vi. Perform a complete rebuild of the blog entries.
The last step is quite important since the script filename is itself a MT Tag and is used in most template files to provide a link to the comment page (assuming you have it enabled). If you didn’t change the links in your previous posts, you would end up with a lot of broken links in your archives and older postings. By doing a complete rebuild of your site, all the links would be updated with the new filename from the mt.cfg file.
Once they start scanning your site for the filename, there will still be a few of them that get through, but it won’t be anything near the volume you would normally get. It also means that it will slow them down as they have to use more resources to try and discover the name and link of the comment file.
For the moment, I am going to leave the filename as it is and not implement the dynamic changes. I’ll see how long it takes them to discover the new link and then manually change it once again to see how long that one lasts. If the gaps get too short then I might think about generating a tool to do the above.
It will have the ability to be run manually, and if your website supports it, as an automatic process run at user defined intervals.