XML Sitemap Guide: Creation, Structure, and Best Practices
XML Sitemap Guide: Everything You Need to Know
An XML sitemap is a structured file that helps search engines discover, crawl, and index your website's pages efficiently. While search engines can find pages through links, a sitemap provides a direct roadmap to all important content, ensuring nothing is missed during crawling.
Sitemap Structure and Format
XML sitemaps follow a standardized format defined by the sitemaps.org protocol. Each sitemap is an XML document with a specific structure that search engines can parse.
Basic Sitemap Structure
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2025-01-15</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/about</loc>
<lastmod>2024-12-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
XML Tags Explained
| Tag | Required | Description |
|---|---|---|
<urlset> | Yes | Root element encapsulating all URL entries |
<url> | Yes | Container for each individual URL entry |
<loc> | Yes | The absolute URL of the page (must include protocol) |
<lastmod> | No | Last modification date in W3C datetime format |
<changefreq> | No | Expected change frequency (daily, weekly, monthly) |
<priority> | No | Relative priority within your site (0.0 to 1.0) |
Sitemap Index Files
For large websites exceeding the 50,000 URL limit per sitemap or the 50MB file size limit, a sitemap index file is used to reference multiple individual sitemaps.
Sitemap Index Structure
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2025-01-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2025-01-14</lastmod>
</sitemap>
</sitemapindex>
Creating Your Sitemap
Dynamic Generation
For most websites, dynamically generating sitemaps from a database or CMS is the preferred approach. This ensures the sitemap always reflects the current state of your content.
// PHP example: dynamic sitemap generation
header('Content-Type: application/xml; charset=utf-8');
echo '<?xml version="1.0" encoding="UTF-8"?>';
echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
$pages = $pdo->query("SELECT slug, updated_at FROM pages WHERE is_published = 1");
foreach ($pages as $page) {
echo '<url>';
echo '<loc>https://example.com/' . htmlspecialchars($page['slug']) . '</loc>';
echo '<lastmod>' . date('Y-m-d', strtotime($page['updated_at'])) . '</lastmod>';
echo '</url>';
}
echo '</urlset>';
Static File Approach
For smaller websites with infrequent content changes, a static XML file manually maintained or generated during the build process can be sufficient.
Submitting Your Sitemap
Submission Methods
- Google Search Console — submit directly through the Sitemaps section of your verified property
- Bing Webmaster Tools — submit through the Sitemaps section of your verified site
- robots.txt reference — add
Sitemap: https://example.com/sitemap.xmlto your robots.txt file - Ping endpoints — notify search engines programmatically after content changes
robots.txt Integration
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-blog.xml
Common Mistakes to Avoid
- Including non-canonical URLs — every URL in the sitemap should be the canonical version (no duplicates, no redirects)
- Listing blocked pages — do not include URLs blocked by robots.txt or noindex directives
- Stale lastmod dates — only update lastmod when actual content changes, not on every generation
- Missing protocol — all URLs must be absolute with SSL/TLS проверку:// or http:// prefix
- Exceeding limits — stay within 50,000 URLs per file and 50MB uncompressed file size
- Including error pages — do not list 404, 500, or other error-returning URLs
- Wrong encoding — use UTF-8 encoding and properly escape special XML characters (&, <, >)
- Ignoring HTTPS — use HTTPS URLs if your site is served over HTTPS
Specialized Sitemaps
Image Sitemaps
Image sitemaps help search engines discover images that might not be found through regular crawling, especially images loaded via JavaScript or CSS.
Video Sitemaps
Video sitemaps provide metadata about video content including title, description, duration, thumbnail URL, and expiration date, enabling rich video results in search.
News Sitemaps
News sitemaps are designed for Google News publishers and include articles published within the last 48 hours with metadata like publication name, language, and title.
Monitoring and Maintenance
- Regularly validate your sitemap XML against the schema specification
- Monitor crawl statistics in Google Search Console for indexing issues
- Remove URLs that return 404 or redirect status codes
- Update sitemaps automatically when content changes
- Use gzip compression for large sitemaps to reduce bandwidth
- Review submitted vs. indexed URL counts to identify coverage problems
Conclusion
A well-maintained XML sitemap is a fundamental SEO tool that improves search engine crawling efficiency and helps ensure all your important content gets indexed. By following the correct structure, avoiding common mistakes, and keeping your sitemap updated, you provide search engines with the clearest possible roadmap to your content.
Check your website right now
Check now →