Friday, June 05, 2009

Bing has /semhtml/ pages and being indexed by Google

I was very lucky to have a quick little personal tour of Bing from Dr Barney Pell - the very smart guy who founded Powerset. Powerset is the search engine technology Microsoft bought to help 'verb up' their new engine (which became Bing).

In his tour we wandered into the reference section of Bing - the enhanced Wikipedia area. I gasped. I had seen /semhtml/ in the URL of content/scrape pages. "You can't call them that!" I said, "Search engine marketing HTML pages" .

Barney's response was quick, "SEMantic HTML." So these /semhtml/ pages aren't supposed to be search landing pages.

First off; here's how to find them. Search Bing for someone famous - let's go with Alexandre Dumas for now.


Click on the "enhanced view". This appears in lieu of "cached page" when there's the Wikipedia 'enhancement'.



If you want to compare this page to Wikipedia you'll see that they're a match.

As of today Microsoft/ the Bing team haven't done anything to keep search engines away from the /semhtml/ section of the site. There's no noindex in the HTML and Bing.com's robots.txt doesn't block the folder.

In fact it's easy to show that Google's already begun indexing these pages.

I'm happy to believe that these aren't "search engine marketing HTML" pages but "SEMantic HTML" pages instead.

However; Bing is clearly trying to get some of their sexy pages into search. They've a sitemap XML file for them!



Google has made it clear that it doesn't want search results in its search results. We'll just have to see what happens next.

blog comments powered by Disqus