I have two sitemaps which partly duplicate - one is blocked by robots.txt but can't figure out why!
-
Hi, I've just found two sitemaps - one of them is .php and represents part of the site structure on the website. The second is a .txt file which lists every page on the website. The .txt file is blocked via robots exclusion protocol (which doesn't appear to be very logical as it's the only full sitemap). Any ideas why a developer might have done that?
-
There are standards for the sitemaps .txt and .xml sitemaps, where there are no standards for html varieties. Neither guarantees the listed pages will be crawled, though. HTML has some advantage of potentially passing pagerank, where .txt and .xml varieties don't.
These days, xml sitemaps may be more common than .txt sitemaps but both perform the same function.
-
yes, sitemap.txt is blocked for some strange reason. I know SEOs do this sometimes for various reasons, but in this case it just doesn't make sense - not to me, anyway.
-
Thanks for the useful feedback Chris - much appreciated - Is it good practice to use both - I guess it's a good idea if onsite version only includes top-level pages? PS. Just checking nature of block!
-
Luke,
The .php one would have been created as a navigation tool to help users find what they're looking for faster, as well as to provide html links to search engine spiders to help them reach all pages on the site. On small sites, such sitemaps often include all pages of the site, on large ones, it might just be high level pages. The .txt file is non html and exists to provide search engines with a full list of urls on the site for the sole purpose of helping search engines index all the site's pages.
The robots.txt file can also be used to specify the location of the sitemap.txt file such as
sitemap: http://www.example.com/sitemap_location.txt
Are you sure the sitemap is being blocked by the robots.txt file or is the robots.txt file just listing the location of the sitemap.txt?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why doesn't my website crawl by Google?
Hi mozzers and members, I am having issues, why my website: http://profilecosmeticsurgery.com/ crawl by Google? let me share more clearly when this starts happening. A month or around 45 days back our website is being indexed and crawled quite well without any issues with having .html extension pages with static built website.
Intermediate & Advanced SEO | | SEOOOOOoooooooo
We finally thought to change to .php version and make whole website and its pages to be treated dynamically.
Once we changed all changes, thereafter this issues started. It has been more than 45 days, our website isn't being crawled since then. I didn't know what are the things preventing this to? Please help. Thanks in Advance Capture1.PNG0 -
Http resolving to https - why isn't it doing that?
Hi everyone I've just been looking at a few https websites and noticed the http urls weren't redirecting to their https equivalents - why would a website owner not bother redirecting? As an example: http://www.marksandspencer.com I look forward to your feedback. L
Intermediate & Advanced SEO | | McTaggart0 -
Can some one help me find this Matt Cutts article on disavows?
Hey everyone. A while ago, I remember reading that Matt Cutts said that you can just disavow domains, and that the Google Webmaster Tools team doesn't read for comments (like if webmasters had been reached out to). Is this ringing any bells? I'm trying to find this tidbit again. Thanks!
Intermediate & Advanced SEO | | Charles_Murdock
Charles0 -
Use Canonical or Robots.txt for Map View URL without Backlink Potential
I have a Page X with lots of unique content. This page has a "Map view" option, which displays some of the info from Page X, but a lot is ommitted. Questions: Should I add canonical even though Map View URL does not display a lot of info from Page X or adding to robots.txt or noindex, follow? I don't see any back links coming to Map View URL Should Map View page have unique H1, title tag, meta des?
Intermediate & Advanced SEO | | khi50 -
Massive URL blockage by robots.txt
Hello people, In May there has been a dramatic increase in blocked URLs by robots.txt, even though we don't have so many URLs or crawl errors. You can view the attachment to see how it went up. The thing is the company hasn't touched the text file since 2012. What might be causing the problem? Can this result any penalties? Can indexation be lowered because of this? ?di=1113766463681
Intermediate & Advanced SEO | | moneywise_test0 -
Blocking poor quality content areas with robots.txt
I found an interesting discussion on seoroundtable where Barry Schwartz and others were discussing using robots.txt to block low quality content areas affected by Panda. http://www.seroundtable.com/google-farmer-advice-13090.html The article is a bit dated. I was wondering what current opinions are on this. We have some dynamically generated content pages which we tried to improve after panda. Resources have been limited and alas, they are still there. Until we can officially remove them I thought it may be a good idea to just block the entire directory. I would also remove them from my sitemaps and resubmit. There are links coming in but I could redirect the important ones (was going to do that anyway). Thoughts?
Intermediate & Advanced SEO | | Eric_edvisors0 -
Negative impact on crawling after upload robots.txt file on HTTPS pages
I experienced negative impact on crawling after upload robots.txt file on HTTPS pages. You can find out both URLs as follow. Robots.txt File for HTTP: http://www.vistastores.com/robots.txt Robots.txt File for HTTPS: https://www.vistastores.com/robots.txt I have disallowed all crawlers for HTTPS pages with following syntax. User-agent: *
Intermediate & Advanced SEO | | CommercePundit
Disallow: / Does it matter for that? If I have done any thing wrong so give me more idea to fix this issue.0 -
Google sees redirect when there isn't any?
I've posted a question previously regarding the very strange changes in our search positions here http://www.seomoz.org/q/different-pages-ranking-for-search-terms-often-irrelevant New strange thing I've noticed - and very disturbing thing - seems like Google has somehow glued two pages together. Or, in other words, looks like Google sees a 301 redirect from one page to another. This, actually, happened to several pages, I'll illustrate it with our Flash templates page. URL: http://www.templatemonster.com/flash-templates.php
Intermediate & Advanced SEO | | templatemonster
Has been #3 for 'Flash templates' in Google. Reasons why it looks like redirect:
Reason #1
Now this http://www.templatemonster.com/logo-templates.php page is ranking instead of http://www.templatemonster.com/flash-templates.php
Also, http://www.templatemonster.com/flash-templates.php is not in the index.
That what would typically happen if you had 301 from Flash templates to logo templates page. Reason #2
If you search for cache:http://www.templatemonster.com/flash-templates.php Google will give the cahced version of http://www.templatemonster.com/logo-templates.php!!!
If you search for info:www.templatemonster.com/flash-templates.php you again get info on http://www.templatemonster.com/logo-templates.php instead! Reason #3
In Google Webmaster Tools when I look for the external links to http://www.templatemonster.com/logo-templates.php I see all the links from different sites, which actually point to http://www.templatemonster.com/flash-templates.php listed as "Via this intermediate link: http://www.templatemonster.com/flash-templates.php" As I understand Google makes this "via intermediate link" when there's a redirect? That way, currently Google thinks that all the external links we have for Flash templates are actually pointing to Logo templates? The point is we NEVER had any kind of redirect from http://www.templatemonster.com/flash-templates.php to http://www.templatemonster.com/logo-templates.php I've seen several similar situations on Google Help forums but they were never resolved. So, I wonder if anybody can explain how that could have happened, and what can be done to solve that problem?0