I have two sitemaps which partly duplicate - one is blocked by robots.txt but can't figure out why!

McTaggart

Hi, I've just found two sitemaps - one of them is .php and represents part of the site structure on the website. The second is a .txt file which lists every page on the website. The .txt file is blocked via robots exclusion protocol (which doesn't appear to be very logical as it's the only full sitemap). Any ideas why a developer might have done that?

Chris.Menke

There are standards for the sitemaps .txt and .xml sitemaps, where there are no standards for html varieties. Neither guarantees the listed pages will be crawled, though. HTML has some advantage of potentially passing pagerank, where .txt and .xml varieties don't.

These days, xml sitemaps may be more common than .txt sitemaps but both perform the same function.

McTaggart

yes, sitemap.txt is blocked for some strange reason. I know SEOs do this sometimes for various reasons, but in this case it just doesn't make sense - not to me, anyway.

McTaggart

Thanks for the useful feedback Chris - much appreciated - Is it good practice to use both - I guess it's a good idea if onsite version only includes top-level pages? PS. Just checking nature of block!

Chris.Menke

Luke,

The .php one would have been created as a navigation tool to help users find what they're looking for faster, as well as to provide html links to search engine spiders to help them reach all pages on the site. On small sites, such sitemaps often include all pages of the site, on large ones, it might just be high level pages. The .txt file is non html and exists to provide search engines with a full list of urls on the site for the sole purpose of helping search engines index all the site's pages.

The robots.txt file can also be used to specify the location of the sitemap.txt file such as

sitemap: http://www.example.com/sitemap_location.txt

Are you sure the sitemap is being blocked by the robots.txt file or is the robots.txt file just listing the location of the sitemap.txt?

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

I have two sitemaps which partly duplicate - one is blocked by robots.txt but can't figure out why!

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Crawl and Indexation Error - Googlebot can't/doesn't access specific folders on microsites

Something happened within the last 2 weeks on our WordPress-hosted site that created "duplicates" by counting www.company.com/example and company.com/example (without the 'www.') as separate pages. Any idea what could have happened, and how to fix it?

How to handle a blog subdomain on the main sitemap and robots file?

Google is ranking the wrong page and I don't know why?

Can I dissavow links on a 301'd website?

Robot.txt help

Backlinks from one website to my 3 websites (hosted in 1 c-block) ?

Category Pages - Canonical, Robots.txt, Changing Page Attributes