How to audit a sitemap.xml file for SEO errors
Updated 2026-06-21
To audit a sitemap.xml, check it against the sitemaps.org spec: confirm it stays under 50,000 URLs and 50 MB, has no duplicate or non-HTTPS locations, and uses valid lastmod, changefreq and priority values. The fastest way is to paste the raw XML into a validator that runs every check at once.
What a sitemap audit actually checks
A search engine will silently skip entries it can't parse, so a "valid" file can still under-deliver. A thorough audit looks for:
- Duplicate locations. The same URL listed twice wastes crawl budget and signals a generation bug.
- Non-HTTPS URLs. Mixing in plain HTTP links is a quality and security red flag.
- The hard limits. A single sitemap may not exceed 50,000 URLs or 50 MB uncompressed. Past either, you must split it and reference the parts from a sitemap index.
- Invalid optional tags. The changefreq value must be one of always, hourly, daily, weekly, monthly, yearly or never. The priority must sit between 0.0 and 1.0. The lastmod must be a real ISO 8601 date such as 2026-06-15 — "last tuesday" or 2026-13-01 fails.
- Consistency. Mixed trailing slashes (some paths ending in a slash, some not) and overly long URLs (above the ~2,048-character HTTP/2 safe limit) point to canonicalization problems.
Audit yours in three steps
- Open your sitemap in a browser (typically yoursite.com/sitemap.xml) and copy the full XML, or open the file from your build output.
- Paste it into the Sitemap.xml Auditor. It parses urlset files and sitemapindex files alike, entirely in your browser — nothing is uploaded.
- Read the results: a URL tree that collapses thousands of links into a browsable host-and-path hierarchy, plus a health-check panel that groups findings into errors, warnings and notices.
Reading the results
Errors are spec violations a search engine may reject outright — duplicates, out-of-range priorities, invalid changefreq values, or busting the 50,000-URL ceiling. Warnings are quality issues worth fixing: non-HTTPS links, malformed (non-absolute) URLs that can't be placed in the tree, bad lastmod dates and inconsistent trailing slashes. Notices, like URLs missing a lastmod, are informational — lastmod is optional, but adding it helps engines recrawl changed pages sooner.
The tree view is the quickest way to spot a structural mistake: a section that should have dozens of pages showing only one, or a stray host you didn't expect, jumps out immediately.
Fix, then export a clean list
Once you've corrected the issues at the source (your CMS or static-site generator), re-paste the regenerated XML to confirm every check passes. You can also export a sorted, de-duplicated list of every URL as a plain-text file — handy for diffing against your crawl, feeding a link checker, or sanity-checking what's actually indexed.
Because the whole audit runs locally, you can safely check a staging or internal sitemap without exposing private URLs to a third-party server. Paste your XML into the Sitemap.xml Auditor and clear the errors before you submit it to Search Console.