Posts Tagged ‘sitemaps’

Best practices for XML sitemaps & RSS/Atom feeds

Webmaster level: intermediate-advanced

Submitting sitemaps can be an important part of optimizing websites. Sitemaps enable search engines to discover all pages on a site and to download them quickly when they change. This blog post explains which fields in sitemaps are important, when to use XML sitemaps and RSS/Atom feeds, and how to optimize them for Google.

Sitemaps and feeds

Sitemaps can be in XML sitemap, RSS, or Atom formats. The important difference between these formats is that XML sitemaps describe the whole set of URLs within a site, while RSS/Atom feeds describe recent changes. This has important implications:

  • XML sitemaps are usually large; RSS/Atom feeds are small, containing only the most recent updates to your site.
  • XML sitemaps are downloaded less frequently than RSS/Atom feeds.

For optimal crawling, we recommend using both XML sitemaps and RSS/Atom feeds. XML sitemaps will give Google information about all of the pages on your site. RSS/Atom feeds will provide all updates on your site, helping Google to keep your content fresher in its index. Note that submitting sitemaps or feeds does not guarantee the indexing of those URLs.

Example of an XML sitemap:

<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <url>
   <loc>http://example.com/mypage</loc>
   <lastmod>2011-06-27T19:34:00+01:00</lastmod>
   <!-- optional additional tags -->
 </url>
 <url>
   ...
 </url>
</urlset>

Example of an RSS feed:

<?xml version="1.0" encoding="utf-8"?>
<rss>
 <channel>
   <!-- other tags -->
   <item>
     <!-- other tags -->
     <link>http://example.com/mypage</link>
     <pubDate>Mon, 27 Jun 2011 19:34:00 +0100</pubDate>
   </item>
   <item>
     ...
   </item>
 </channel>
</rss>

Example of an Atom feed:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 <!-- other tags -->
 <entry>
   <link href="http://example.com/mypage" />
   <updated>2011-06-27T19:34:00+01:00</updated>
   <!-- other tags -->
 </entry>
 <entry>
   ...
 </entry>
</feed>

“other tags” refer to both optional and required tags by their respective standards. We recommend that you specify the required tags for Atom/RSS as they will help you to appear on other properties that might use these feeds, in addition to Google Search.

Best practices

Important fields

XML sitemaps and RSS/Atom feeds, in their core, are lists of URLs with metadata attached to them. The two most important pieces of information for Google are the URL itself and its last modification time:

URLs

URLs in XML sitemaps and RSS/Atom feeds should adhere to the following guidelines:

  • Only include URLs that can be fetched by Googlebot. A common mistake is including URLs disallowed by robots.txt — which cannot be fetched by Googlebot, or including URLs of pages that don’t exist.
  • Only include canonical URLs. A common mistake is to include URLs of duplicate pages. This increases the load on your server without improving indexing.
Last modification time

Specify a last modification time for each URL in an XML sitemap and RSS/Atom feed. The last modification time should be the last time the content of the page changed meaningfully. If a change is meant to be visible in the search results, then the last modification time should be the time of this change.

  • XML sitemap uses  <lastmod>
  • RSS uses <pubDate>
  • Atom uses <updated>

Be sure to set or update last modification time correctly:

  • Specify the time in the correct format: W3C Datetime for XML sitemaps, RFC3339 for Atom and RFC822 for RSS.
  • Only update modification time when the content changed meaningfully.
  • Don’t set the last modification time to the current time whenever the sitemap or feed is served.

XML sitemaps

XML sitemaps should contain URLs of all pages on your site. They are often large and update infrequently. Follow these guidelines:

  • For a single XML sitemap: update it at least once a day (if your site changes regularly) and ping Google after you update it.
  • For a set of XML sitemaps: maximize the number of URLs in each XML sitemap. The limit is 50,000 URLs or a maximum size of 10MB uncompressed, whichever is reached first. Ping Google for each updated XML sitemap (or once for the sitemap index, if that’s used) every time it is updated. A common mistake is to put only a handful of URLs into each XML sitemap file, which usually makes it harder for Google to download all of these XML sitemaps in a reasonable time.

RSS/Atom

RSS/Atom feeds should convey recent updates of your site. They are usually small and updated frequently. For these feeds, we recommend:

  • When a new page is added or an existing page meaningfully changed, add the URL and the modification time to the feed.
  • In order for Google to not miss updates, the RSS/Atom feed should have all updates in it since at least the last time Google downloaded it. The best way to achieve this is by using PubSubHubbub. The hub will propagate the content of your feed to all interested parties (RSS readers, search engines, etc.) in the fastest and most efficient way possible.

Generating both XML sitemaps and Atom/RSS feeds is a great way to optimize crawling of a site for Google and other search engines. The key information in these files is the canonical URL and the time of the last modification of pages within the website. Setting these properly, and notifying Google and other search engines through sitemaps pings and PubSubHubbub, will allow your website to be crawled optimally, and represented accordingly in search results.

If you have any questions, feel free to post them here, or to join other webmasters in the webmaster help forum section on sitemaps.

Posted by Alkis Evlogimenos, Google Feeds Team

An update to the Webmaster Tools API

Webmaster level: advanced

Over the summer the Webmaster Tools team has been cooking up an update to the Webmaster Tools API. The new API is consistent with other Google APIs, makes it easier to authenticate for apps or web-services, and provides access to some of the main features of Webmaster Tools.

If you’ve used other Google APIs, getting started with the new Webmaster Tools API will be easy! We have examples for Python, Java, as well as OACurl (for fans of command lines).

This API allows you to:

  • list, add, or remove sites from your account (you can currently have up to 500 sites in your account)
  • list, add, or remove sitemaps for your websites
  • get warning, error, and indexed counts for individual sitemaps
  • get a time-series of all kinds of crawl errors for your site
  • list crawl error samples for specific types of errors
  • mark individual crawl errors as “fixed” (this doesn’t change how they’re processed, but can help simplify the UI for you)

We’d love to see what you’re building with our APIs! Feel free to link to your projects in the comments below. Should you have any questions about the usage of the API, feel free to post in our help forum as well.

Posted by John Mueller, fan of long command lines, Google Zürich

App Indexing updates

Webmaster Level: Advanced

In October, we announced guidelines for App Indexing for deep linking directly from Google Search results to your Android app. Thanks to all of you that have expressed interest. We’ve just enabled 20+ additional applications that users will soon see app deep links for in Search Results, and starting today we’re making app deep links to English content available globally.

We’re continuing to onboard more publishers in all languages. If you haven’t added deep link support to your Android app or specified these links on your website or in your Sitemaps, please do so and then notify us by filling out this form.

Here are some best practices to consider when adding deep links to your sitemap or website:

  • App deep links should only be included for canonical web URLs.
  • Remember to specify an app deep link for your homepage.
  • Not all website URLs in a Sitemap need to have a corresponding app deep link. Do not include app deep links that aren’t supported by your app.
  • If you are a news site and use News Sitemaps, be sure to include your deep link annotations in the News Sitemaps, as well as your general Sitemaps.
  • Don’t provide annotations for deep links that execute native ARM code. This enables app indexing to work for all platforms

When Google indexes content from your app, your app will need to make HTTP requests that it usually makes under normal operation. These requests will appear to your servers as originating from Googlebot. Therefore, your server’s robots.txt file must be configured properly to allow these requests.

Finally, please make sure the back button behavior of your app leads directly back to the search results page.

For more details on implementation, visit our updated developer guidelines. And, as always, you can ask questions on the mobile section of our webmaster forum.

Posted by Michael Xu, Software Engineer

Indexing apps just like websites

Webmaster Level: Advanced

Searchers on smartphones experience many speed bumps that can slow them down. For example, any time they need to change context from a web page to an app, or vice versa, users are likely to encounter redirects, pop-up dialogs, and extra swipes and taps. Wouldn’t it be cool if you could give your users the choice of viewing your content either on the website or via your app, both straight from Google’s search results?
Today, we’re happy to announce a new capability of Google Search, called app indexing, that uses the expertise of webmasters to help create a seamless user experience across websites and mobile apps.
Just like it crawls and indexes websites, Googlebot can now index content in your Android app. Webmasters will be able to indicate which app content you’d like Google to index in the same way you do for webpages today — through your existing Sitemap file and through Webmaster Tools. If both the webpage and the app contents are successfully indexed, Google will then try to show deep links to your app straight in our search results when we think they’re relevant for the user’s query and if the user has the app installed. When users tap on these deep links, your app will launch and take them directly to the content they need. Here’s an example of a search for home listings in Mountain View:

We’re currently testing app indexing with an initial group of developers. Deep links for these applications will start to appear in Google search results for signed-in users on Android in the US in a few weeks. If you are interested in enabling indexing for your Android app, it’s easy to get started:

  1. Let us know that you’re interested. We’re working hard to bring this functionality to more websites and apps in the near future.
  2. Enable deep linking within your app.
  3. Provide information about alternate app URIs, either in the Sitemaps file or in a link element in pages of your site.

For more details on implementation and for information on how to sign up, visit our developer site. As always, if you have any questions, please ask in the mobile section of our webmaster forum.

Posted by , Product Manager