I have two sitemaps which partly duplicate - one is blocked by robots.txt but can't figure out why!
-
Hi, I've just found two sitemaps - one of them is .php and represents part of the site structure on the website. The second is a .txt file which lists every page on the website. The .txt file is blocked via robots exclusion protocol (which doesn't appear to be very logical as it's the only full sitemap). Any ideas why a developer might have done that?
-
There are standards for the sitemaps .txt and .xml sitemaps, where there are no standards for html varieties. Neither guarantees the listed pages will be crawled, though. HTML has some advantage of potentially passing pagerank, where .txt and .xml varieties don't.
These days, xml sitemaps may be more common than .txt sitemaps but both perform the same function.
-
yes, sitemap.txt is blocked for some strange reason. I know SEOs do this sometimes for various reasons, but in this case it just doesn't make sense - not to me, anyway.
-
Thanks for the useful feedback Chris - much appreciated - Is it good practice to use both - I guess it's a good idea if onsite version only includes top-level pages? PS. Just checking nature of block!
-
Luke,
The .php one would have been created as a navigation tool to help users find what they're looking for faster, as well as to provide html links to search engine spiders to help them reach all pages on the site. On small sites, such sitemaps often include all pages of the site, on large ones, it might just be high level pages. The .txt file is non html and exists to provide search engines with a full list of urls on the site for the sole purpose of helping search engines index all the site's pages.
The robots.txt file can also be used to specify the location of the sitemap.txt file such as
sitemap: http://www.example.com/sitemap_location.txt
Are you sure the sitemap is being blocked by the robots.txt file or is the robots.txt file just listing the location of the sitemap.txt?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt was set to disallow for 14 days
We updated our website and accidentally overwrote our robots file with a version that prevented crawling ( "Disallow: /") We realized the issue 14 days later and replaced after our organic visits began to drop significantly and we quickly replace the robots file with the correct version to begin crawling again. With the impact to our organic visits, we have a few and any help would be greatly appreciated - Will the site get back to its original status/ranking ? If so .. how long would that take? Is there anything we can do to speed up the process ? Thanks
Intermediate & Advanced SEO | | jc42540 -
301ing one site's links to another
Hi, I have one site with a well-established link profile, but no actual reason to exist (site A). I have another site that could use a better link profile (site B). In your experience, would 301 forwarding all of site A's pages to site B do anything positive for the link profile/organic search of the site B? Site A is about boating at a specific lake. Site B is about travel destinations across the U.S. Thanks! Best... Michael
Intermediate & Advanced SEO | | 945010 -
Two A Records or one CNAME and one A Record for www / naked domain?
What's the best way to point a domain to an IP address? Two A records one with host @ and the other with host www ? One A record pointing to the naked domain
Intermediate & Advanced SEO | | aj613
One CNAME record with host www pointing to nakeddomain.com Something else? Thanks so much! This seems so basic, but always seems to stump me. I'm using WordPress, if it makes difference.0 -
Not sure how we're blocking homepage in robots.txt; meta description not shown
Hi folks! We had a question come in from a client who needs assistance with their robots.txt file. Metadata for their homepage and select other pages isn't appearing in SERPs. Instead they get the usual message "A description for this result is not available because of this site's robots.txt – learn more". At first glance, we're not seeing the homepage or these other pages as being blocked by their robots.txt file: http://www.t2tea.com/robots.txt. Does anyone see what we can't? Any thoughts are massively appreciated! P.S. They used wildcards to ensure the rules were applied for all locale subdirectories, e.g. /en/au/, /en/us/, etc.
Intermediate & Advanced SEO | | SearchDeploy0 -
How to make Google index your site? (Blocked with robots.txt for a long time)
The problem is the for the long time we had a website m.imones.lt but it was blocked with robots.txt.
Intermediate & Advanced SEO | | FCRMediaLietuva
But after a long time we want Google to index it. We unblocked it 1 week or 8 days ago. But Google still does not recognize it. I type site:m.imones.lt and it says it is still blocked with robots.txt What should be the process to make Google crawl this mobile version faster? Thanks!0 -
Link earning for local businesses who can't afford content marketing
What are some of the best ways to earn and build quality relevant links that will increase exposure to your target market in addition to assisting search rankings? I personally find that local niche directories and PR are the best ways to accomplish this without having content to "earn links"..what else works? Any interesting ideas??
Intermediate & Advanced SEO | | RickyShockley0 -
Two sites, two domains, two brands, 98% same content
There are two affiliated brick & mortar retail stores moving into e-commerce. For non-marketing reasons separate e-commerce websites are desired. The two brands are based in separate (nearby) cities in the same Canadian province. Although the store name and branding will be different, the content on the site will either be near duplicates or exact duplicates. The more I look into this on Google and SEOmoz QA, the more I am concerned about the SEO implications of this. SEOmoz QA: Multiple cities/regions websites - duplicate content? "So, yes, because you are offering the same services at second location, you are thinking correctly about the need to rewrite all content so it's not a duplicate of site #1." Duplicate content - Webmaster Tools Help "However, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic… In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results. ... Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results." Unfortunately, I would say there's very little chance that rewritten content will happen in the foreseeable future. With that said, I'd be greatly appreciative of the concerns and remedies that the SEOmoz community has to offer (even if they're for future use). Thanks in advance.
Intermediate & Advanced SEO | | GOODSIR0 -
My site has multiple H1's, one in the logo image and one as a header. Is there any official stance from the search engines on this?
In doing some research on this issue, I came across this blog post which seems to suggest it certainly will be a trigger to search engines. http://www.seounique.com/blog/multiple-h1-tags-triggers-google-penalty/ Could be a false positive on his specific case, but I was wondering what the community thought. Thanks in advance!
Intermediate & Advanced SEO | | jim_shook0