Creating an XML sitemap is straightforward, but creating one that truly maximizes your SEO results requires following established best practices. Search engine algorithms and guidelines evolve constantly, and what worked five years ago may not be optimal today. Here are the current best practices for XML sitemaps that will help you get the most out of this essential SEO tool.
Only Include Canonical, Indexable URLs
One of the most common sitemap mistakes is including every URL on your website without filtering. Your sitemap should only contain URLs that you want search engines to index. This means excluding:
- Pages with noindex meta tags or X-Robots-Tag headers
- Redirect URLs (301, 302, or meta refresh)
- Pages that return 4xx or 5xx error codes
- Duplicate pages (include only the canonical version)
- Paginated archive pages (unless they contain unique content)
- Internal search result pages
- Session-specific or user-specific URLs
- Admin and login pages
By keeping your sitemap clean and focused, you signal to search engines that every URL listed is worth crawling and indexing. This helps preserve your crawl budget and improves the overall quality signal of your sitemap.
Use Accurate lastmod Dates
The lastmod tag tells search engines when a page was last meaningfully modified. This is arguably the most important optional tag in your sitemap because Google uses it to prioritize which pages to recrawl.
The key word here is "meaningfully." Do not update the lastmod date for trivial changes like fixing a typo or updating a sidebar widget. Reserve lastmod updates for substantial content changes that would affect how the page appears in search results. If you artificially inflate your lastmod dates, Google may begin to distrust and ignore them entirely.
Use the W3C Datetime format for consistency. The most common formats are YYYY-MM-DD for date precision or YYYY-MM-DDThh:mm:ssTZD for full datetime precision.
Keep Sitemaps Under Size Limits
Google and other search engines enforce the following limits for individual sitemap files:
- Maximum file size: 50MB (uncompressed)
- Maximum number of URLs: 50,000
For most small to medium websites, these limits are more than sufficient. However, large sites may need to split their URLs across multiple sitemap files and use a sitemap index file to organize them.
A sitemap index file is essentially a sitemap of sitemaps. It lists the locations of all your individual sitemap files, allowing search engines to discover and process them all from a single entry point.
Organize Large Sitemaps Logically
When splitting your sitemap into multiple files, organize them by logical content categories rather than arbitrary batches. For example:
- sitemap-pages.xml for static pages
- sitemap-blog.xml for blog posts
- sitemap-products.xml for product pages
- sitemap-categories.xml for category pages
This organization makes it easier to monitor indexing performance by content type in Google Search Console and helps you identify which sections of your site may have indexing issues.
Use HTTPS URLs Consistently
If your site uses HTTPS (and it should in 2025), ensure all URLs in your sitemap use the HTTPS protocol. Mixing HTTP and HTTPS URLs creates confusion for search engines and can lead to indexing issues. Your sitemap should reflect the canonical version of each URL exactly as you want it indexed.
Similarly, be consistent with trailing slashes and www vs non-www versions. Pick one canonical format and use it everywhere in your sitemap.
Compress Large Sitemaps with Gzip
For sitemaps approaching the 50MB limit, compress them using gzip to reduce file size and speed up download times. Search engines fully support gzip-compressed sitemaps (with the .xml.gz extension). This reduces bandwidth usage for both your server and the search engine crawler.
Reference Your Sitemap in robots.txt
Always include a Sitemap directive in your robots.txt file. This serves as an automatic discovery mechanism that works across all search engines without requiring manual submission to each one.
Place the directive at the end of your robots.txt file. You can include multiple Sitemap directives if you have multiple sitemap files or a sitemap index.
Validate Your Sitemap Regularly
XML sitemaps must conform to the sitemap protocol schema. Even small syntax errors can cause search engines to reject your entire sitemap. Common validation issues include:
- Invalid XML characters (especially in URLs with special characters)
- Missing required XML namespace declarations
- Incorrect date formats
- URLs containing unescaped ampersands or other special characters
Run your sitemap through an XML validator periodically, especially after making changes to your sitemap generation process.
Update Your Sitemap Regularly
A stale sitemap is better than no sitemap, but a current sitemap is best. Establish a routine for updating your sitemap that matches your content publishing frequency:
- Daily publishers (news sites, active blogs): Update daily or use dynamic sitemap generation
- Weekly publishers: Update weekly
- Monthly or less: Update monthly or after each batch of content changes
Many modern CMS platforms and web frameworks can generate sitemaps dynamically on each request, eliminating the need for manual updates entirely.
Monitor Performance in Search Console
Submitting your sitemap is not a set-and-forget task. Regularly check Google Search Console's Sitemaps report to monitor:
- How many URLs Google has discovered and indexed from your sitemap
- Any errors or warnings Google encountered while processing your sitemap
- The gap between discovered URLs and indexed URLs (a large gap may indicate content quality issues)
- How frequently Google is re-fetching your sitemap
Use this data to identify and fix issues quickly. A sudden drop in indexed URLs or a spike in errors deserves immediate investigation.
Do Not Include URLs Blocked by robots.txt
If a URL is disallowed in your robots.txt file, do not include it in your sitemap. While this is technically valid according to the sitemap protocol, it sends mixed signals to search engines. Blocked URLs in your sitemap appear in Search Console as "Indexed, though blocked by robots.txt" or "Blocked by robots.txt," which clutters your reports and wastes crawl resources.
Consider Using Sitemap Extensions
For specialized content, take advantage of sitemap extensions:
- Image sitemap extension: Helps search engines discover images embedded in your pages
- Video sitemap extension: Provides metadata about video content
- News sitemap extension: Required for inclusion in Google News
These extensions allow you to provide additional context about your content that search engines cannot easily extract from the pages themselves.
Following these best practices ensures your XML sitemap serves as an effective communication channel between your website and search engines, maximizing your content's visibility in search results.