Thursday, April 27, 2006

Not in Google

This blog isn't in Google right now. It's fairly common these days for blogs to scratch the surface and then dip out of the index. I see people on SEW's forums asking about it all the time (thankfully, SEW isn't yet the same wash of old questions which kills Webmasterworld). We did pick up PageRank 5/10 though. I'd expect this to drop down as far as 2/10 before we start to see page inclusions from the archives though.

As it happens, Google finally enhanced the sitemaps program into the webmaster console which we all saw coming. They blog about it here.

It's a handy chance to use this blog (which is only in the sitemap XML via the standard atom.xml feed) to see what an 'not indexed' site looks like. I do have some stats for the blog as there has been some token inclusion in the past. The sitemap interface does not seem to want to recognise that fact through and just encourages me to wait for indexing.

In many ways this blog looks as if it has been banned. It was in the index. Now it's not. You can search for and come up with no results.

I saw Danny Sullivan name this as the test to see if a site has been banned. He's wrong on that one.

The sitemap interface is a better guide. This is the message I would get at the summary section of my interface report if this blog had been banned.

Google's own full screen shot of the same image is here.

Monday, April 24, 2006

CAPTCHA versus Cash

I really dislike CAPTCHAs. They don't work. They simply annoy the human user, prevent some colour blind users from posting and just encourage robots to try and re-try (xN) until they get their comment spam through.

The funny thing is that Ian McAllister blogged about the very same annoyance recently.

The Haloscan comments and trackback forms I have on this blog are, at least, behind a layer of JavaScript. JavaScript requires more effort for spambots to read than a CAPCHA takes to defeat. Of course, this blog is one of the millions of unread pages out there and so neither a target for comment spam nor have I had to test Haloscan when it comes to dealing with lots of actual comments.

Never in the history of Search has any engine been so up front as to actually prove the impact of a link or attribute. This seems like a hard circle to break.

LiveJournal (which is now back to using ads) may have a simple solution in the community front. You can lock you blog down to comments only from friends. People who you've approved don't need to deal with CAPTCHAs and spiders simply cannot post. LiveJournal also "nofollows" the entire "Friends" page side of every blog and community there.

The problem with the friends/community front is how you get into the loop in the first place. If you cannot comment on a blog then you'll not be able to announce yourself as a reader to the blogger. If you can't do that then the blogger can't add you as a friend.

What would really turn comment spam off is if 'rel="nofollow"' was seen to work. Right now there's no incentive for spam bots to detect and avoid these blogs. Its much easier just to flood many blogs at once.

My radical solution is to charge people a deposit whenever they comment on your blog. This solution only becomes possible when micro-payments are possible (we're waiting on PayPal or Google's payment system in Google Base to give that to us). Each blog comment leaves a deposit of 1p (or there abouts) which is refunded by the blogger when the comment is approved. Alternatively, the deposit auto-refunds a week later unless that's overridden by the blogger. Actual users are never really exposed to any real financial risk. Spam bots with tens of thousands of comments certainly are. The draw backs? This system faces the same challenge as PayPal does. You may need a credit card in order to get into the loop in the first place.