Google Sitemaps to Expose Web Dark Matter, Transforming the Face of Search Forever

Google has today rolled out a beta test of Google Sitemaps, a facility for webmasters to submit lists of pages which they would like to crawl for inclusion in their index. This new feature promises to shed light on the ‘dark matter’ of the web — the billions of pages of content previously inaccessible to search engines — and in so doing to change the face of search forever.

Today, Google launches Sitemaps, a tool which webmasters can use to submit lists of site URLs which they would like to have crawled by Google’s bots. (See the the About documentation and the FAQ for the program.)

For years, pundits have speculated about how and when the web’s ‘dark matter’ would eventually be exposed to the search engines. As it stands right now, billions of pages of content remain inaccessible to search engine bots because of the way those pages are stored and accessed — often in large databases which require query strings to be specified through HTML-based forms. Since bots do not fill in forms and press buttons on web pages, these billions of pages of content have remained invisible to bots and thus inaccessible to users via search engines. Other content has remained invisible to search engine bots as a result of poor design, or as a result of reliance on Flash-based navigation systems.

All that is about to change.

With Google Sitemaps, a webmaster can simply provide Google with a list of URLs of pages they would like to have crawled. A free open-source tool is even available to help generate a specially formatted XML site map which includes not only URLs, but also ‘hints’ like a page’s last modification, or an estimate of update frequency.

With this seemingly simple move, Google has shifted the burden of locating the web’s hidden information from bots crawling pages for links to the content providers themselves, and there can be little doubt that content providers will rise to the challenge — bringing to light billions of pages of previously inaccessible information.

This article was last updated on Friday, 3rd June 2005 at 4:26 pm and is filed in the Search Engine Marketing section. You can leave a response below.

Feed for this Entry Trackback Address

Bookmark and Share:

There are no comments yet on this article -- would you like to be the first to post a response?

Join the Discussion!

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


A valid email address is required to enable you to personally verify and authorize your comment for posting. It will not be displayed in your post or used in any other way. SPAM comments will be deleted immediately.