Today, Google launches Sitemaps, a tool which webmasters can use to submit lists of site URLs which they would like to have crawled by Google’s bots. (See the the About documentation and the FAQ for the program.)
For years, pundits have speculated about how and when the web’s ‘dark matter’ would eventually be exposed to the search engines. As it stands right now, billions of pages of content remain inaccessible to search engine bots because of the way those pages are stored and accessed — often in large databases which require query strings to be specified through HTML-based forms. Since bots do not fill in forms and press buttons on web pages, these billions of pages of content have remained invisible to bots and thus inaccessible to users via search engines. Other content has remained invisible to search engine bots as a result of poor design, or as a result of reliance on Flash-based navigation systems.
All that is about to change.
With Google Sitemaps, a webmaster can simply provide Google with a list of URLs of pages they would like to have crawled. A free open-source tool is even available to help generate a specially formatted XML site map which includes not only URLs, but also ‘hints’ like a page’s last modification, or an estimate of update frequency.
With this seemingly simple move, Google has shifted the burden of locating the web’s hidden information from bots crawling pages for links to the content providers themselves, and there can be little doubt that content providers will rise to the challenge — bringing to light billions of pages of previously inaccessible information.

Bookmark and Share: