Skip to content
← All articles

XML Sitemap Guide: Creation, Structure, and Best Practices

XML Sitemap Guide: Everything You Need to Know

An XML sitemap is a structured file that helps search engines discover, crawl, and index your website's pages efficiently. While search engines can find pages through links, a sitemap provides a direct roadmap to all important content, ensuring nothing is missed during crawling.

Sitemap Structure and Format

XML sitemaps follow a standardized format defined by the sitemaps.org protocol. Each sitemap is an XML document with a specific structure that search engines can parse.

Basic Sitemap Structure

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2025-01-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/about</loc>
    <lastmod>2024-12-01</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

XML Tags Explained

TagRequiredDescription
<urlset>YesRoot element encapsulating all URL entries
<url>YesContainer for each individual URL entry
<loc>YesThe absolute URL of the page (must include protocol)
<lastmod>NoLast modification date in W3C datetime format
<changefreq>NoExpected change frequency (daily, weekly, monthly)
<priority>NoRelative priority within your site (0.0 to 1.0)

Sitemap Index Files

For large websites exceeding the 50,000 URL limit per sitemap or the 50MB file size limit, a sitemap index file is used to reference multiple individual sitemaps.

Sitemap Index Structure

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2025-01-15</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-blog.xml</loc>
    <lastmod>2025-01-14</lastmod>
  </sitemap>
</sitemapindex>

Creating Your Sitemap

Dynamic Generation

For most websites, dynamically generating sitemaps from a database or CMS is the preferred approach. This ensures the sitemap always reflects the current state of your content.

// PHP example: dynamic sitemap generation
header('Content-Type: application/xml; charset=utf-8');
echo '<?xml version="1.0" encoding="UTF-8"?>';
echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

$pages = $pdo->query("SELECT slug, updated_at FROM pages WHERE is_published = 1");
foreach ($pages as $page) {
    echo '<url>';
    echo '<loc>https://example.com/' . htmlspecialchars($page['slug']) . '</loc>';
    echo '<lastmod>' . date('Y-m-d', strtotime($page['updated_at'])) . '</lastmod>';
    echo '</url>';
}
echo '</urlset>';

Static File Approach

For smaller websites with infrequent content changes, a static XML file manually maintained or generated during the build process can be sufficient.

Submitting Your Sitemap

Submission Methods

  1. Google Search Console — submit directly through the Sitemaps section of your verified property
  2. Bing Webmaster Tools — submit through the Sitemaps section of your verified site
  3. robots.txt reference — add Sitemap: https://example.com/sitemap.xml to your robots.txt file
  4. Ping endpoints — notify search engines programmatically after content changes

robots.txt Integration

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-blog.xml

Common Mistakes to Avoid

  • Including non-canonical URLs — every URL in the sitemap should be the canonical version (no duplicates, no redirects)
  • Listing blocked pages — do not include URLs blocked by robots.txt or noindex directives
  • Stale lastmod dates — only update lastmod when actual content changes, not on every generation
  • Missing protocol — all URLs must be absolute with SSL/TLS проверку:// or http:// prefix
  • Exceeding limits — stay within 50,000 URLs per file and 50MB uncompressed file size
  • Including error pages — do not list 404, 500, or other error-returning URLs
  • Wrong encoding — use UTF-8 encoding and properly escape special XML characters (&, <, >)
  • Ignoring HTTPS — use HTTPS URLs if your site is served over HTTPS

Specialized Sitemaps

Image Sitemaps

Image sitemaps help search engines discover images that might not be found through regular crawling, especially images loaded via JavaScript or CSS.

Video Sitemaps

Video sitemaps provide metadata about video content including title, description, duration, thumbnail URL, and expiration date, enabling rich video results in search.

News Sitemaps

News sitemaps are designed for Google News publishers and include articles published within the last 48 hours with metadata like publication name, language, and title.

Monitoring and Maintenance

  • Regularly validate your sitemap XML against the schema specification
  • Monitor crawl statistics in Google Search Console for indexing issues
  • Remove URLs that return 404 or redirect status codes
  • Update sitemaps automatically when content changes
  • Use gzip compression for large sitemaps to reduce bandwidth
  • Review submitted vs. indexed URL counts to identify coverage problems

Conclusion

A well-maintained XML sitemap is a fundamental SEO tool that improves search engine crawling efficiency and helps ensure all your important content gets indexed. By following the correct structure, avoiding common mistakes, and keeping your sitemap updated, you provide search engines with the clearest possible roadmap to your content.

Check your website right now

Check your site →
More articles: SEO
SEO
How to Get Cited by ChatGPT and Perplexity
15.06.2026 · 34 views
SEO
robots.txt and AI Bots: GPTBot, ClaudeBot, Google-Extended
15.06.2026 · 36 views
SEO
Web Accessibility: A Developer's Practical Guide to WCAG Compliance
16.03.2026 · 132 views
SEO
Redirects and SEO: 301, 302, and Canonical Tags
14.03.2026 · 119 views