Wednesday, August 22, 2007

200 billion pages

I'm just reading through a patent application on how a search engine could build a primary and secondary index based on phrases rather than keywords.

In the introduction the author (who we can safely place at Mountain View) estimates that there are currently 200 billion pages on the Internet today. She says the best search engines only index between 6 and 8 billion pages.

What a difference, huh? Statistically your home page and this blog post has between a 3% and 4% chance of being indexed by a search engine.

1 comments:

Anonymous said...

Shame that 120 billion of them are splogs, templates for every restaurant or widget in the US, several million domain squatting URLs hosting ads disguised as content, infnitely recursive google bait, and so on.

I think the search engines are onto something. If you were exposed to all 200 billion pages your mind would implode from the pointless noise.