Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Will an XML sitemap override a robots.txt
-
I have a client that has a robots.txt file that is blocking an entire subdomain, entirely by accident. Their original solution, not realizing the robots.txt error, was to submit an xml sitemap to get their pages indexed.
I did not think this tactic would work, as the robots.txt would take precedent over the xmls sitemap. But it worked... I have no explanation as to how or why.
Does anyone have an answer to this? or any experience with a website that has had a clear Disallow: / for months , that somehow has pages in the index?
-
The robots file will avoid google to show further information on the disallowed pages but it doesn't prevent indexation.
They're still indexed (that's why you're seeing them) but with no meta desc nor text taken from the page because google wasn't allowed to retrieve more information.
If you want them to start showing info, you'll jsut need to remove that rule from the robots.txt and soon you'll start seeing those pages information showing, but if you want them out of the index you can use GWT to remove them from the index after you've included in each page the noindex meta tag which is the only command which will prevent indexation.
-
I assumed the same thing, but I performed a site command search while they were prospects, and they had 1 result present with the explanation of "A description for this result is not available because of this site's robots.txt – learn more"
They uploaded an xml sitemap before I could tell them to remove the robots.txt. and 1 week later, the entire site is now in the index.
I have used the robots.txt to properly block websites, it usually takes 2-3 for all results to drop out the index, so I don't know how that could explain it either.
-
I agree, the only way I could think this would work would be if the robotx.txt file was on the root domain. I agree, check Webmaster tools, they will tell you under the sitemaps section about "Error: URL was blocked by robots.txt).
One thing to remember is that robots.txt is technically a suggestion to ask search engines not to crawl your site. They can choose to ignore it, though personally I don't know of any cases in which this happenned.
-
An XML sitemap shouldn't override robots.txt. If you have Google Webmaster Tools setup, you will see warnings on the sitemaps page that pages being blocked by robots are being submitted.
Now, robots.txt does not prevent indexation, just crawling. So if the pages were indexed before they implemented robots.txt, they may continue to be indexed. Google will also display just the URL for pages that it's discovered, but can't crawl because of robots.txt.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Googlebot does not obey robots.txt disallow
Hi Mozzers! We are trying to get Googlebot to steer away from our internal search results pages by adding a parameter "nocrawl=1" to facet/filter links and then robots.txt disallow all URLs containing that parameter. We implemented this late august and since that, the GWMT message "Googlebot found an extremely high number of URLs on your site", stopped coming. But today we received yet another. The weird thing is that Google gives many of our nowadays robots.txt disallowed URLs as examples of URLs that may cause us problems. What could be the reason? Best regards, Martin
Technical SEO | | TalkInThePark0 -
Hosting sitemap on another server
I was looking into XML sitemap generators and one that seems to be recommended quite a bit on the forums is the xml-sitemaps.com They have a few versions though. I'll need more than 500 pages indexed, so it is just a case of whether I go for their paid for version and install on our server or go for their pro-sitemaps.com offering. For the pro-sitemaps.com they say: "We host your sitemap files on our server and ping search engines automatically" My question is will this be less effective than my installing it on our server from an SEO perspective because it is no longer on our root domain?
Technical SEO | | design_man0 -
Should I ping Each New Article And How Will It Help Me
Hi I am looking for advice to get my new articles out there and my news stories. I would like to know if i should ping my articles and news stories each time i write a new one and if so which ping services do you recommend and what do they really do. Can anyone recommend any other services that i could use to promote my new articles and news stories to gain more traffic. Many thanks
Technical SEO | | ClaireH-1848861 -
How to generate a visual sitemap using sitemap.xml
Are there any tools (online preferably) which will take a sitemap.xml file and generate a visual site map? Seems like an obvious thing to do, but can't find any simple tools for this?
Technical SEO | | k3nn3dy30 -
XML Sitemap without PHP
Is it possible to generate an XML sitemap for a site without PHP? If so, how?
Technical SEO | | jeffreytrull11 -
Robots.txt file getting a 500 error - is this a problem?
Hello all! While doing some routine health checks on a few of our client sites, I spotted that a new client of ours - who's website was not designed built by us - is returning a 500 internal server error when I try to look at the robots.txt file. As we don't host / maintain their site, I would have to go through their head office to get this changed, which isn't a problem but I just wanted to check whether this error will actually be having a negative effect on their site / whether there's a benefit to getting this changed? Thanks in advance!
Technical SEO | | themegroup0 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050 -
Best Dynamic Sitemap Generator
Hello Mozers, Could you please share the best Dynamic Sitemap Generator you are using. I have found this place: http://www.seotools.kreationstudio.com/xml-sitemap-generator/free_dynamic_xml_sitemap_generator.php Thanks in advanced for your help.
Technical SEO | | SEOPractices0