Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Will an XML sitemap override a robots.txt
-
I have a client that has a robots.txt file that is blocking an entire subdomain, entirely by accident. Their original solution, not realizing the robots.txt error, was to submit an xml sitemap to get their pages indexed.
I did not think this tactic would work, as the robots.txt would take precedent over the xmls sitemap. But it worked... I have no explanation as to how or why.
Does anyone have an answer to this? or any experience with a website that has had a clear Disallow: / for months , that somehow has pages in the index?
-
The robots file will avoid google to show further information on the disallowed pages but it doesn't prevent indexation.
They're still indexed (that's why you're seeing them) but with no meta desc nor text taken from the page because google wasn't allowed to retrieve more information.
If you want them to start showing info, you'll jsut need to remove that rule from the robots.txt and soon you'll start seeing those pages information showing, but if you want them out of the index you can use GWT to remove them from the index after you've included in each page the noindex meta tag which is the only command which will prevent indexation.
-
I assumed the same thing, but I performed a site command search while they were prospects, and they had 1 result present with the explanation of "A description for this result is not available because of this site's robots.txt – learn more"
They uploaded an xml sitemap before I could tell them to remove the robots.txt. and 1 week later, the entire site is now in the index.
I have used the robots.txt to properly block websites, it usually takes 2-3 for all results to drop out the index, so I don't know how that could explain it either.
-
I agree, the only way I could think this would work would be if the robotx.txt file was on the root domain. I agree, check Webmaster tools, they will tell you under the sitemaps section about "Error: URL was blocked by robots.txt).
One thing to remember is that robots.txt is technically a suggestion to ask search engines not to crawl your site. They can choose to ignore it, though personally I don't know of any cases in which this happenned.
-
An XML sitemap shouldn't override robots.txt. If you have Google Webmaster Tools setup, you will see warnings on the sitemaps page that pages being blocked by robots are being submitted.
Now, robots.txt does not prevent indexation, just crawling. So if the pages were indexed before they implemented robots.txt, they may continue to be indexed. Google will also display just the URL for pages that it's discovered, but can't crawl because of robots.txt.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How does changing sitemaps affect SEO
Hi all, I have a question regarding changing the size of my sitemaps. Currently I generate sitemaps in batches of 50k. A situation has come up where I need to change that size to 15k in order to be crawled by one of our licensed services. I haven't been able to find any documentation on whether or not changing the size of my sitemaps(but not the pages included in them) will affect my rankings negatively or my SEO efforts in general. If anyone has any insights or has experienced this with their site please let me know!
Technical SEO | | Jason-Reid0 -
Indexing product attributes in sitemap
Hey Mozzers! I'm battling a few questions about the sitemap for my ecommerce store. Could you help me out? Is it necessary to include your product attributes in the sitemap? I'm not sure why it would matter to have a sitemap that lists everything in the color cherry. Also, if the attributes were included in the sitemap, would that count as duplicate content for the same products to show up in multiple attributes? Is there any benefit to submitting the sitemaps individually? For example, submitting /product-sitemap.xml, /product_brand-sitemap.xml versus just /sitemap.xml? Any other best practices for managing my ecommerce sitemap, or great resources, would be very helpful. Thank you! a1vUz
Technical SEO | | localwork0 -
Is there a limit to how many URLs you can put in a robots.txt file?
We have a site that has way too many urls caused by our crawlable faceted navigation. We are trying to purge 90% of our urls from the indexes. We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags. Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file? Could this cause any issues for us? Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.
Technical SEO | | kcb81780 -
Should I block Map pages with robots.txt?
Hello, I have a website that was started in 1999. On the website I have map pages for each of the offices listed on my site, for which there are about 120. Each of the 120 maps is in a whole separate html page. There is no content in the page other than the map. I know all of the offices love having the map pages so I don't want to remove the pages. So, my question is would these pages with no real content be hurting the rankings of the other pages on our site? Therefore, should I block the pages with my robots.txt? Would I also have to remove these pages (in webmaster tools?) from Google for blocking by robots.txt to really work? I appreciate your feedback, thanks!
Technical SEO | | imaginex0 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?
I've got several URL's that I need to disallow in my robots.txt file. For example, I've got several documents that I don't want indexed and filters that are getting flagged as duplicate content. Rather than typing in thousands of URL's I was hoping that wildcards were still valid.
Technical SEO | | mkhGT0 -
Is there any value in having a blank robots.txt file?
I've read an audit where the writer recommended creating and uploading a blank robots.txt file, there was no current file in place. Is there any merit in having a blank robots.txt file? What is the minimum you would include in a basic robots.txt file?
Technical SEO | | NicDale0 -
Which is the best wordpress sitemap plugin
Does anyone have a recommendation for the best xml sitemap plugin for wordpress sites or do you steer clear of plugins and use a sitemap generator then load it up to the root manually?
Technical SEO | | simoncmason0