When it comes to optimizing websites for search engines, sitemap submission plays a crucial role since sitemaps help search engines discover all the pages within a website and download them quickly before they change. This October, Google introduced some guidelines and best practices to use sitemaps properly for improving your website presence. Sitemaps can be in XML sitemap, RSS or Atom formats and the search engine giant recommends the use of both XML sitemaps and RSS/Atom feeds for optimal crawling.
According to the official blog post, XML sitemaps are typically large and are downloaded less frequently than RSS/Atom feeds, while the RSS/Atom feeds are small with only the most recent updates to your site. So if you use both formats, XML sitemaps will provide Google information about all the pages within your site and RSS/Atom feeds will provide all updates on your site. In this way, the search engine can keep your website content fresher in its index.
The blog post also highlights the two most important pieces of information within sitemaps for the search engine, which are:
- URL – Use only those URLs in XML sitemaps and RSS/Atom feeds, which can be fetched by Googlebot. Do not include URLs disallowed by robots.txt (they cannot be fetched by Googlebot) or URLs of pages that don’t exist. Include only canonical URLs. If you include URLs of duplicate pages, it will increase the load on your server without improving the indexing.
- Last Modification Time – A last modification time should be specified for each URL in an XML sitemap and RSS/Atom feed and it should be the last time the web page content changed meaningfully. If you want the change made on the web page to be visible in the search results, then the time of that change should be specified as the last modification time. Each format uses different tags for the last modification time (XML sitemap uses , RSS uses and Atom uses ). Ensure that the last modification time is set or updated correctly. Use the correct format to specify the time (W3C Datetime for XML sitemaps, RFC3339 for Atom and RFC822 for RSS) and update modification time only when the content has changed meaningfully. Do not set the last modification time to the current time whenever the sitemap or feed is served.
Best Practices for XML Sitemaps
- In the case of a single XML sitemap, update it at least once in a day (if the site changes regularly) and ping (submitting sitemap using an HTTP request) Google after updating it.
- In the case of a set of XML sitemaps, maximize the number of URLs in each XML sitemap. However, there is a limit of 50,000 URLs or a maximum size of 10MB uncompressed (whichever reached first). It is required to ping Google for each updated XML sitemap (or once for the sitemap index, if it is used) whenever it is updated. Do not put only a handful of URLs into each XML sitemap file, which will make it difficult for the search engine to download all the XML sitemaps within a reasonable time.
Best Practices for RSS/Atom Feeds
- Whenever a new page is added or an existing page within the website is changed meaningfully, add the URL as well as the modification time to the feed.
- The RSS/Atom feed should have all updates in it since at least the last time Google downloaded it so that the search engine won’t miss any update. It is advisable to use PubSubHubbub to accomplish this. It will propagate the content of your feed to all interested parties such as RSS readers, search engines etc. in the fastest and most efficient way possible.
If online marketers follow these recommendations along with other web optimization efforts, Google results will show their latest website content better. This will help them to drive more consumers, and improve traffic and sales.