I have two sitemaps which partly duplicate - one is blocked by robots.txt but can't figure out why!
-
Hi, I've just found two sitemaps - one of them is .php and represents part of the site structure on the website. The second is a .txt file which lists every page on the website. The .txt file is blocked via robots exclusion protocol (which doesn't appear to be very logical as it's the only full sitemap). Any ideas why a developer might have done that?
-
There are standards for the sitemaps .txt and .xml sitemaps, where there are no standards for html varieties. Neither guarantees the listed pages will be crawled, though. HTML has some advantage of potentially passing pagerank, where .txt and .xml varieties don't.
These days, xml sitemaps may be more common than .txt sitemaps but both perform the same function.
-
yes, sitemap.txt is blocked for some strange reason. I know SEOs do this sometimes for various reasons, but in this case it just doesn't make sense - not to me, anyway.
-
Thanks for the useful feedback Chris - much appreciated - Is it good practice to use both - I guess it's a good idea if onsite version only includes top-level pages? PS. Just checking nature of block!
-
Luke,
The .php one would have been created as a navigation tool to help users find what they're looking for faster, as well as to provide html links to search engine spiders to help them reach all pages on the site. On small sites, such sitemaps often include all pages of the site, on large ones, it might just be high level pages. The .txt file is non html and exists to provide search engines with a full list of urls on the site for the sole purpose of helping search engines index all the site's pages.
The robots.txt file can also be used to specify the location of the sitemap.txt file such as
sitemap: http://www.example.com/sitemap_location.txt
Are you sure the sitemap is being blocked by the robots.txt file or is the robots.txt file just listing the location of the sitemap.txt?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Creating two websites from one and building up traffic to the new domain quickly
A client has an existing successful website that sells niche products - they are well known in their marketplace. They have two sets of key customers, let's call them (a) and (b), that need addressing in different ways to maximise sales. (a) is the more specialist end of the market, where people have complex needs - there are fewer of them but repeat business is likely, and we can talk to them in more technical language. (b) is the layman's end of the market - there is a vast pool of potential customers but they'll be more casual buyers and need to be addressed more in layman's terms. So what they want to do is to take their existing website, and essentially split it into two different websites, one for each market. The one that will use the existing domain, with all the links that have built up over the years pointing to it, will be the site for the more specialist end of the market (a). The domain name suits it better, which is why he wants to use the existing domain with that site and not the other. (b) will be a brand new domain. The client will write new product descriptions across the board so that the two sets of product information are not duplicate. I'd rather he didn't do this at all, because of the risk involved, and the difficulty of building up the traffic to the new site, which is after all the one with the best chance of mass market sales. But given that the client has decided that this is definitely what he wants, does anyone have any thoughts on what the action plan should be?
Intermediate & Advanced SEO | | helga730 -
Can we talk a bit more about cannibalisation? Will Google pick one page and disregard others.
Hi all. I work for an e-commerce site called TOAD Diaries and we've been building some landing pages recently. Our most generic page was for '2017 Diaries'. Take a look here. Initial results are encouraging as this page is ranking top page for a lot of 'long tail' search queries, e.g) '2017 diaries a4', '2017 diaries a5', '2017 diaries week to view' etc. Interesting it doesn't even rank top 50 for the 'head term'... '2017 diaries'. **And our home page outranks it for this search term. **Yet it seems clear that this page is considered relevant and quality by Google it ranks just fine for the long tails. Question: Does this mean Google 'chosen' our home page over the 2017-page landing page? And that's why the 2017-page effectively doesn't rank for it's 'head term'? (I can't see this as many times a website will rank multiple times such as amazon) But any thoughts would be greatly appreciated. Also, what would you do in this scenario? Work on home-page to try to push it up for that term and not worry about the landing page? Any suggestions or thoughts would be greatly appreciated. Hope that makes sense. Do shout if not. Thanks in advance. Isaac.
Intermediate & Advanced SEO | | isaac6630 -
Consolidating two different domains to point at same site, duplicate content penalty?
I have two websites that are extremely similar and want to consolidate them into one website by pointing both domain names at one website. is this going to cause any duplicate content penalties by having two different domain names pointing at the same site? Both domains get traffic so i don't want to just discontinue one of the domains.
Intermediate & Advanced SEO | | Ron100 -
Robot.txt help
Hi, We have a blog that is killing our SEO. We need to Disallow Disallow: /Blog/?tag*
Intermediate & Advanced SEO | | Studio33
Disallow: /Blog/?page*
Disallow: /Blog/category/*
Disallow: /Blog/author/*
Disallow: /Blog/archive/*
Disallow: /Blog/Account/.
Disallow: /Blog/search*
Disallow: /Blog/search.aspx
Disallow: /Blog/error404.aspx
Disallow: /Blog/archive*
Disallow: /Blog/archive.aspx
Disallow: /Blog/sitemap.axd
Disallow: /Blog/post.aspx But Allow everything below /Blog/Post The disallow list seems to keep growing as we find issues. So rather than adding in to our Robot.txt all the areas to disallow. Is there a way to easily just say Allow /Blog/Post and ignore the rest. How do we do that in Robot.txt Thanks0 -
What is the best way to rank well in two countries simultaneously with only one CCTLD
I have a .co.nz website and would like to rank on .com.au without setting up a new country specific website for .com.au. What is the best way to do this ?
Intermediate & Advanced SEO | | SteveK640 -
Site has no SEO done on it. It wasn't considered during design. What to do first ?
They opted for videos to explain to people what the website is about, but it ain't working for them. What steps would you take in order to get this site to rank higher without completely changing the design(changing design is out of the question they are low on funds). They also built a blog on wordpress.com and added a .me domain to it. For obvious reasons I'm not mentioning the website.
Intermediate & Advanced SEO | | ternit0 -
Will disallowing in robots.txt noindex a page?
Google has indexed a page I wish to remove. I would like to meta noindex but the CMS isn't allowing me too right now. A suggestion o disallow in robots.txt would simply stop them crawling I expect or is it also an instruction to noindex? Thanks
Intermediate & Advanced SEO | | Brocberry0