TAG | robots.txt
When WEB Design RI performs SEO web site check the most common web site ommision especially a site that was designed several years ago (Sitemaps in this format were fiorst used by Google in 2005), is the sitemap.xml file. The sitemap specifications are detailed at http://www.sitemaps.org
If you want your web site to be indexed by the major search engines and you would like the search engines to be to be notified of changes to your site then you must include a site map XML file in your web site. These files permit the search engines to crawl web sites in an intelligent fashion.
Sitemaps are what is known as a URL inclusion protocol where we notify the search engines of all of the URLs that we would like to be included when they crawl our sites and this compliments the Robots.txt files which is a URL exclusion protocol. See my Robots.txt blog entry.
Sitemaps are extremely useful on web sites which are not browser friendly and that use rich Flash or AJAX content.
Sitemaps are used by the largest search engines: Google, MSN, Yahoo, Bing and Ask.
The process is relatively simple:
1) Create an XML site map file – http://www.xml-sitemaps.com/ is a free sitemap generator tool – there are many free products like this
2) Save your sitemap.xml file to your web site root directory
3) Upload your sitemap.xml file to your web server.
5) This step is important and is often ignored: When you make changes, especially the addition of pages, to your website – create an updated sitemap.xml file and upload the new file to your web server and the next time your site is crawled the spiders will look for the new pages.
Sitemaps do have limits 50,000 URLs and 10 megabytes , so keep your web site to under 50,000 pages!
Once we have created the all important sitemap.xml file, we must inform the search engine crawlers of its location and there are several methods of doing his:
- send an HTTP request
- upload the sitemap.xml file directly to the search engines
- add the location of the sitemap in your robots.txt file
In this blog entry I will only address # 3. While I do upload my sitemap.xml files to the major search engines, I also add the location of the sitemap file to the robots.txt file because all crawlers that look at the robots.txt file wil be directed to my sitemaps and not just the search engines where I manually uploaded the site map. The following is the actual robots.txt file from http://webdesign-ri.com
User-agent: * Sitemap: http://webdesign-ri.com/sitemap.xml Sitemap: http://webdesign-ri.com/sitemap.htm Disallow: /cgi-bin/ Disallow: /case_studies/
In the second line I have provided the path for the sitemap.xml file and in the 3rd line I have added a path to the sitemap.htm file.