Wednesday, October 07, 2009

What is a headless browser?

A headless browser is a web browser without a graphical user interface. In other words it is a browser, a piece of software, that access web pages but doesn’t show them to any human being. They’re actually used to provide the content of web pages to other programs.

For example, a headless browser can be used by a computer program to access a web page and determine how wide that page (or any element on it) would appear to be by default for a user, or what colour text in any element would be, the font family used or even what the x/y coordinates of an object is.

This data is often used to test web pages en mass for quality control or to extract data.

The headless browser is significant because it understands web pages like a browser would – with the caveat that browsers all (annoyingly) behave slightly differently. Headless browsers, for example, should be able to parse JavaScript. They can click on links and even cope with downloads.

In October Google suggested that headless browsers could be used to help their search engine cope with AJAX web sites. In Google’s scenario it would be the responsibility of the web site owner/administrator to set up the headless browser on the web server (rather than a client machine like a PC or Mac), use it to programmatically access the AJAX website on the server and provide the resulting rendering to search engines when they request it.

In essence Google is suggesting that rather than leaving their search engine to parse JavaScript that that translation should be done by the webmaster and their headless browser. Google’s proposal is a set of URL protocols that control when the search engine knows to request the headless browser information and which URL to show to human users. The incentive in the proposal is that web site owners can check what Google’s spider are actually seeing.

This is a classic example of software providing data to another piece of software without a GUI being necessary.

blog comments powered by Disqus