Tuesday, May 23, 2006

Huzzah! Huzzah! No ODP! No ODP!

Fantastic news from Microsoft - they've introduced a brand spanking new meta tag. If you want to avoid their search engine using the ODP's summary of a clients site then roll in:

<meta name="robots" content="noodp">
<meta name="msnbot" content="noodp">

This week I've already seen Google describe the corporate website for the owners of the worlds only seven star hotel - as a type of ice cream thanks to the ODP. I've seen banks with incorrect names - thanks to the ODP. I've seen entire charities described as a cartoon - thanks to the ODP.

Hopefully Google and Yahoo will follow suit. Is Google typically quick to adopt a good idea? Sometimes. But look at RSS - took them a while there. Will Google want to support a MSN introduced initiative? Hmm. We'll see.

Tuesday, May 09, 2006

Google Patents, Google Caches and Tree Obscured Woods

Sometimes you can't see the wood for the trees. I was in the middle of a presentation where I stress the need for a page to stay on topic and build up a good history in Google when I was reminded of a recent Matt Cuts post on Google's proxy unit.

If Blogger's crap image upload works and re-sizes this the following image is a picture from what's widely known as Google's temporal patent application.

Notice how the History Unit can sometimes sit in between the crawler (document locator) and the web (corpus) and how it sits in between the web front and the index.

Compare that then to Matt's own diagram of the cache.

Here the Cache has almost the very same position.

In matter of fact it would make a lot of sense to combine the role of the Google Cache and the role of the Google History unit. You would keep several caches of the evolution of the web. The advantage of this approach is that you're keeping a copy of how the web page actually was and as you improve your algorithm you'll be able to review the page would have scored. In other words, you can apply the new algorithm to the old site.

The alternative is to use the History Unit as the place to store the algorithm/indexer's interpretation of the web page. Ie, record which keywords it does well on and its structural elements. The advantage here is that you'll use a lot less space.

Google could even do both as I suspect space is not a concern (despite a recent Register diatribe which mulls otherwise).

Tuesday, May 02, 2006

A9 The Dark Horse

I've always said that A9 are the dark horse in the search engine race. At the time, I would point out how they're powered by Google but, by gosh, you can see how well they data mine. Amazon know exactly what sort of book you like. Imagine if Amazon made the step from being Powered By Google to being Powered By Amazon. Imagine if Amazon put their marketing machine behind A9.

In many ways Amazon have put their marketing machine behind A9. There are discounts for users who search with it - a token search at that, just one and you could get Pi Shared (pi/2) discount from Amazon.com (no joy if you're in the UK or elsewhere; Amazon really needs to get joint up). The Amazon site does promote A9 quite heavily.

We just have not seen A9 take off in the way that it might. I have heard some people ask; "Why use A9, why not just use Google?". People say this about a lot of search engines but with A9 the point was that it was powered by Google, the added extras that Amazon had added to A9 were not enough to give it enough in the way of Unique Selling Points.

The change to Windows Live in both unexpected and expected. It's one of those changes that I didn't predict coming because I simply didn't think about it. A professional SEO does not think about A9 for very many minutes in the day. The change makes sense though. The move to Windows Live gives Amazon a couple of angles;

  • More people will use A9 than will use Windows Live's raw search.

  • Microsoft would have paid a hefty dose.

  • A strong hand to negotiate with Google when it times to negotiate. They've proven they'll change search providers if they want.
  • You might even wonder if Amazon, world class data miners, are now in the position to compare and contrast Google semi-raw data with Windows Live semi-raw data.