Would pinging Google Sitemap get our blog indexed faster?

Now that there is a way for blogspot blog to add Google sitemap, and
also a way to ping Google sitemap, does this mean that if a new blog
have not been indexed yet, by adding Google sitemap to their blog and
then pinging Google sitemap, one can get one’s blog indexed faster?

This is part of a response I made to an email on the Yahoo Blogger support group to the above question. There had been a few questions on the subject in recent weeks so I thought I would post it here as post.

The answer or in this case answers is yes and no and now that I think about it, maybe….
If you host your blog on their servers (blogspot), it will take 3-7 days for them to find it.
If you host your blog on your own server, it will take around 5-7 days for them to find it.  (using blogger that is.. If say you use Moveable Type, then it will take them forevre to find it if you don’t have any inbound links or tell them about it).
If you add a google sitemap and tell them about it, it will still take 3-7 days for them to get round to indexing it.

Also remember that just because you’ve told google about your site in say hours… It doesn’t mean they will do anything with the data for weeks. It all comes down to the quality of the site content (and that goes with physical content as well site construction (coding, colour schemes etc)). Just telling Google about a site is a fraction of the story.

But, here is the word of warning I gave out before…

Yes Google have provided a way for you to authenticate your site by the use of a META Tag…
In order to prove that the sitemap you have submitted to Google belongs to you, they used to get you to upload an ‘verification file’ to your root directory. 
Google would then know that you had access to that folder and the chances were that the site was yours. 
Because some Blogspot users can’t do this (because you host on blogspot not on your own web server), they provided a method for users to use a META TAG inserted in their template to provide the authentication.  

But being on blogspot you cannot generate a SITEMAP.XML (or SITEMAP.XML.GZ – the compressed version). 
Usually, this would be a file that contains ALL of the pages on your site.  So in the case of one of my personal blogs, a list to over 2500 individual pages.
There is nothing to stop you generating this SITEMAP file no matter who you are hosted with, but if you are on blogspot you cannot upload it to your sites root folder (because you don’t have FTP access to that folder).
You cannot place it on a remote machine and then point Google Sitemaps to it because it has to be on the root folder of your website.
But don’t despair…   You have two options, one of which is a tadge sneaky, but still alllowed.  

Continue reading “Would pinging Google Sitemap get our blog indexed faster?”

How to Remove Word Verification From Blogger Blogs

A couple of questions have been posted to the Yahoo forum recently about the Word Verification feature on their blogs. Basically, there are a couple of processes that can cause your blog to get flagged. (One is when visitors tick the flag at the top of the blog, another is the Blogger “robot’ scanning your site and determining that your content is Spam. The ‘robot’ is not the most inteligent thing in the world).

Anyway, the blogger help page here, describes the method required to get word Verification turned off.

To avoid further inconveniences when publishing, click the “?” (question mark) icon next to the word verification on your posting form:

Upgrading Moveable Type from 3.15 to 3.20

Update: 05/12/2005

I should have posted an update here rather than in the comments (but I wanted to test the comments out again anyway). The update process was fairly simple. I had made some changes to the code for 3.15 and I forgot about those when I upgraded to 3.20. Apart from that, it was simple as transferring all of the new files to my webserver and going to the main page and running the upgrade tool. No interaction from me required at all (once I remembered what changes I had made).

Some great work by the MT crew to produce such an easy update.

I’m finally getting round to upgrading my MT installation which I am intendning to start in about 5mins.

Point being, if you see this post in a few days, you know I’ve screwed up and I’ve not fixed it yet.

Now the instructions say it is easy… We’ll see.

CNN Report: FBI agents bust ‘Botmaster’ : One down, 3,450,342 to go

LOS ANGELES, California (Reuters) — A 20-year-old man accused of using thousands of hijacked computers, or “bot nets,” to damage systems and send massive amounts of spam across the Internet was arrested on Thursday in what authorities called the first such prosecution of its kind

Read Original Article Here

One down, 3,450,342 to go. (Judging by the number of attacks being made on my network each hour).
But this is a start I suppose, let us hope that there are many many more.

Mind you, in saying that, the amount of SPAM hitting my addresses this past few weeks has dropped to less than 10 a week !! It has not been this quiet since I first went online. I haven’t changed anything, added any new protection. All I did was create a honeypot@thenameofmydomain.com for each domain I run and enable a spam trap on it that forwarded to SPAMCOP

And after more than 8 new variants of the Bagle virus landed in my mail this week, it appears to have gone quiet again the last few days.
So don’t forget to update your virus definitions several times a day and if you can configure it to do it automatically during the day, more the better.

Most User Browsers and OS’es hitting this domain.

With the recent (MS05-038) and (MS05-039) problems from Microsoft, I decided to have a look at my web logs for the yaps4u.net domain to see what sort users where hitting my site.

In relation to the (MS05-039) problem, I wanted to see how many Windows 2000 users there were out there.

If you click on the extended entry below you will see the stats from my server logs taken over a 24 hour period and equates to about 6000 hits.

As you will see 13.98% of hits use Windows 2000 which is quite a sizeable chunk of internet users.

It appears that 60% of users are of the Windows flavour.

I suspect the low number of Firefox users is probably down to a few page rendering errors when viewing my site with that browser. I will sort it out one day when I find the part number for ordering some more roundtuits.

Over 30% of IE Browsers are pre-IE6 and closer examination of the logs shows that not all of them are patched or up to date, which is very worrying in this day and age.

I won’t go into the Anti-Microsoft thing, mainly because I am pro-Microsoft. In fact I am pro-anything, I just refuse to jump on the bandwagon and attack Microsoft at any oppurtunity. They happen to produce the majority of the tools that I use to do my work, and they perform quite well on the whole, so they can’t be getting it that wrong. And they always give the appearance of being concerned with customers, so I excuse them any transgressions they make over time. Not to say that wouldn’t change if they ever forgot about customers for want of profit, but I can’t see that happening.

As Firefox became popular enough to draw hacking resources away from IE, the problems appeared with that browser too. Ok, there won’t be the hacker who wants to create an exploit just because it is Microsoft, but there will be the commercial hackers whose aim is to gain financial rewards from their hacking exploits (no pun intended), rather than the discrediting of a major organisation.

In fact recent studies have found that there has been a large increase in what has been called commercial hacking, moving away from the trend of specifically targetting home users. The exploits will still ‘use’ the home user as a platform for launching these attacks as these supply the majority of unsecured machines with which to do so.

Now they are more likely to use home machines to attack or gain entry to commerical networks, rather than retrieve an individuals personal data.

Needless to say, no matter whose product has been identified as having a potential or real vulnerability, average Joe must be provided with the education to keep their machines up to date with upgrades/patches and the latest security, or these users will go on providing the methods for the hackers to work their nasties.

Education of the public is a must so rather than directing our angst at one company or group, we should start focussing on bringing Joe Public up to speed.

Continue reading “Most User Browsers and OS’es hitting this domain.”

Blogger for Word

Yeah I know, this is my MT (Moveable Type) blog, so why am I extolling the virtues of a new plug in for Blogger?
I have always been a fan of Blogger and the team (now part of the Google group), because I believe that Blogger is one of the best free Bloggin tools out there.
Granted it may not have all the features of some other bloggin tools, but there isn’t much you can’t do with it using some jiggery pokery.

Anyway, to make things easier they have launched a plug in for MS Word to allow you to make a new post to or update a post on any of your blogs.
I don’t really use blogger much, other than to host my humour and jobsite blog. Even then I only update the Humour blog much lately. I was thinking of moving them both over to my MT server, but this tool is kind of neat.

One of my bugbears with blogger was the response time of the web server when posting, or the sometimes unreliable posting mechanism that often caused me to lose long posts. Towards the end I used to type out my blog posts in Word and save them locally rather than take a chance of losing them. Now I can do that again, but can post it directly to my blog with a click of a single button.

http://buzz.blogger.com/bloggerforword.html

With Blogger for Word, publishing a Word document to your blog is just as seamless as saving it to your computer, and it’s easy to get started; all you need to do is download and install the Blogger for Word add-in, and three buttons appear in your Word toolbar:

  • Publish creates and publishes a new post from the text in your document.
  • Open Post enables you to edit your last 15 Blogger posts in Word.
  • Save as Draft enables you to keep a post unpublished; it will appear in your Blogger account, but not publicly on your blog.

System Requirements

The Blogger for Word add-in requires Microsoft Windows 2000 or higher and Microsoft Word 2000 or higher. You can use your existing Blogger username and password; if you need a Blogger account, sign up now for a free account and blog.

Moveable Type Comment/Trackback SPAM

Over the past few months the amount of comment SPAM I have been recieving for this website has been going through the roof. So far, and I say so far, they have not found the other blogs that I run.

99.99% of the SPAM is from a select bunch of websites responsible for either medical supplies, or poker. Texas’holdem or whatever it calls itself.

There is no point in complaining to the hosting company of those sites because they just ignore you. You might notice that there are no SPAM comments displayed on my site because I use a MT Plugin call SpamLookup. I recommend that you install this as the very first thing you do after installing MT. If you get a pre-installed version of MT, then make sure that whoever maintains your installation, installs SpamLookup or something as good as it. It will save you a lot of headache, bandwidth and/or database space later on when the ‘dark side’ find you, and they will eventually.

Despite the fact that no SPAM actually makes it on to my web pages, up until 3 days ago I was getting hit by various IP addresses and I was finding 10,000 log entries a day. This all eats bandwidth and fills up my database with the messages. It’s not that major, but it still gets right up my nose. And those 10,000 hits are despite me having a rather long .htaccess file. I last updated my .htaccess a week or so ago when I basically blocked the complete IP address space of Comcast users. This means that anyone on Comcast cannot access my webserver. I also added them to my email blacklist as well. For some reason Comcast are very deaf when it comes to listenening to complaints about abuse on their network. They also end up with a lot of ‘open relays’ sitting on their network which allows the normal email SPAM to come through. It’s a bit drastic I know, but hey, I don’t know many people who use Comcast so it is no skin off my nose. It makes life a lot quieter for me.

Well anyway, after I added Comcast to my .htaccess, I realised that it was taking too much time to update this file and also it was increasing in size. One of the problems with using .htaccess rather than the gloabl site config files is that the .htaccess file is parsed on each web page request. This puts an excessive load on the webserver itself and also slows down webpage access. It’s not a problem for me yet, but if I kept adding entries to the file, it would soon be. I needed another way to keep them from hitting my site.

When looking at my site access logs, you can see that by far the greatest number of entry page hits was for /mt/mt-comments.cgi

That means to say that they are hitting that page first and not getting it from a link on another page. This could be because the default install of MT uses MT-Comment.cgi as its default name. It does provide you with a method of changing the name of the comment script file incase you have to run it with .pl and not a .cgi extension.
Out of interest I changed the name of the file to something less obvious and changed the file name in the configuration file to match.

Amount of comment SPAM over the last 3 days? None.
Amount of 404 errors in my site access logs… 1000’s and 1000’s.

So it appears that they don’t scan your webpages too many times to see what the link actually is for your comment script file.

So here I would be as bold as to suggest that the second thing you do after installing MT on your site is to rename the file and change the setting in the MT.CFG configuration file.

To take this one step further, I can see a day when they make the process more inteligent at their end and actually scan your blog to find out what link the script file exists at. It would be no more difficult to implement than a similar script to harvest email addresses from your site. So one of the first things we may have to consider in the future is a method of obfuscating the name and link of the comment script file.

Perhaps one step further would be to dynamically change the name of the script file on a regular basis. Basically some process would:

i. Copy the script file.
ii. Rename it to some obscure file name.
iii. Make sure it has the correct privaledges.
iv. Update the my.cfg file to point to the new file.
v. Delete the old file.
vi. Perform a complete rebuild of the blog entries.

The last step is quite important since the script filename is itself a MT Tag and is used in most template files to provide a link to the comment page (assuming you have it enabled). If you didn’t change the links in your previous posts, you would end up with a lot of broken links in your archives and older postings. By doing a complete rebuild of your site, all the links would be updated with the new filename from the mt.cfg file.

Once they start scanning your site for the filename, there will still be a few of them that get through, but it won’t be anything near the volume you would normally get. It also means that it will slow them down as they have to use more resources to try and discover the name and link of the comment file.

For the moment, I am going to leave the filename as it is and not implement the dynamic changes. I’ll see how long it takes them to discover the new link and then manually change it once again to see how long that one lasts. If the gaps get too short then I might think about generating a tool to do the above.
It will have the ability to be run manually, and if your website supports it, as an automatic process run at user defined intervals.

18 years old today.

I took the 3 kids up to my parents place in Kent today for one of my nieces birthdays.

It is hard to belive that today she was 18 years old. I remember the day she was born like it was yesterday, and believe me it feels like it was just yesterday. I had joined the Royal Navy one day before my 20th birthday in March the same year she was born.

I had been drafted to HMS Collingwood to complete my technical training and recieved a phone call to let me know my sister had just given birth to a baby girl,(there was no such thing as text messages back then).

I remember going out to wet the babies head, which probably back then in those days ended up drowning it. I can even remember the hangover the next day…

She’ll be out tonight legally drinking in the pubs and clubs. Just as well we don’t live in the US anymore because she still wouldn’t be able to drink legally over there for another 3 more years.

My mum and dad laid on a great BBQ and we all stuffed ourselves with far too much food. There were 7 kids running round the back yard and luckily the weather improved and the day ended in bright sunshine with the temp feeling like it was in the high 20’s.

After driving back home tonight, I felt absolutely cream crackered, proving to me that not only did I feel like I am getting old.

In another 7 years, it will be the turn of my eldests boy to have his 18th… Jeesh, now that is a scary thought..

It was great to see Vikki today and wish her the best on her 18th. I know my dad will probably read this sometime tomorrow and he’ll show her next time she pops round. He’ll be laughing when he reads this next bit and she’ll be mortified (I wonder if she did get up to do the kareoke tonight?). When I’ve found my cell phone I’ll upload a few pics of her and cake 😉 That will have her reaching for the phone!! You wait until I get the pics from her night out 🙂

Moveable Type – SPAM

Checking to see whether I had any comments to check the other day I found I had over 200 comments awaiting my approval.
“My God.. who have I offended now”, I wondered.
As it turned out, a new Moveable Type Spammer had found my site. Most often Comment and Trackback SPAM is caught by a number of rules that I have in place.
I use a MT Plugin called SpamLookup and it works very well. None of the above comments made it on my blog as it automatically Moderated all of them.

SpamLookup has a number of methods by which is blocks SPAM. You can have it automatically delete the comment or moderate it. I prefer to moderate in case some coments get deleted that are valid. I could set it to auto delete seeing as to get labeled as comment/trackback SPAM you have to enter some very SPAM like comments, but I have never trusted automatic rules 100%.

Another method I use is to have a look at my server logs to determine who is viewing what and where from.
You’d never believe me if I said the bulk of comment/trackback SPAM comes from China (get away!!!), Mexico, Thailand, Korea and the odd batch from the USA.
You can tell the USA ones are the occaisional amature who thinks they are about to make a quick buck, because they are not too clever about hiding who they are.

Every now and then, I pull out those who have attempted to SPAM my blog from the server logs and I add them to my .htaccess file.
Even though the spammers are blocked from posting SPAM to my blog, they are still calling the scripts that post comments, which in turn calls the plugin that verifies the comment, which in turn calls an external site (in some cases) to verify the source of the SPAM.
All of this activity uses up bandwidth but more importantly it uses up processing time on my webserver. This has the effect of slowing down the whole system.
In the case of this blog and this webserver, it is a share server. This means that I am not the only person who has an account on that specific computer hosting the webserver. Most often, around 20 accounts probably exist on a share server.
So by placing them in to my .htaccess file I am preventing them from even reaching my MT Scripts and thus reducing the load on my server.

Here are the contents of my .htaccess file:

order allow,deny
deny from 64.27.27.203
deny from 216.129.107.21
deny from 216.40.249.17
deny from 82.103.65.225
deny from 63.208.158.252
deny from 63.208.158.253
deny from 63.208.158.254
deny from 148.244.150.52
deny from 207.248.240.118
deny from 148.244.150.58
deny from 148.244.150.57
deny from 207.248.240.119
deny from 64.4.195.62
deny from 64.27.27.150
deny from 219.150.118.16
deny from 216.195.51.193
deny from 216.195.51.17
deny from 80.77.84.252
deny from 216.32.80.98
deny from 207.248.240.119
deny from 193.190.128.253
deny from 202.28.204.123
allow from all

All of the above addresses are either known Comment/Trackback spammers, or ones that I have picked up from my server logs.