Will an XML sitemap override a robots.txt
-
I have a client that has a robots.txt file that is blocking an entire subdomain, entirely by accident. Their original solution, not realizing the robots.txt error, was to submit an xml sitemap to get their pages indexed.
I did not think this tactic would work, as the robots.txt would take precedent over the xmls sitemap. But it worked... I have no explanation as to how or why.
Does anyone have an answer to this? or any experience with a website that has had a clear Disallow: / for months , that somehow has pages in the index?
-
The robots file will avoid google to show further information on the disallowed pages but it doesn't prevent indexation.
They're still indexed (that's why you're seeing them) but with no meta desc nor text taken from the page because google wasn't allowed to retrieve more information.
If you want them to start showing info, you'll jsut need to remove that rule from the robots.txt and soon you'll start seeing those pages information showing, but if you want them out of the index you can use GWT to remove them from the index after you've included in each page the noindex meta tag which is the only command which will prevent indexation.
-
I assumed the same thing, but I performed a site command search while they were prospects, and they had 1 result present with the explanation of "A description for this result is not available because of this site's robots.txt – learn more"
They uploaded an xml sitemap before I could tell them to remove the robots.txt. and 1 week later, the entire site is now in the index.
I have used the robots.txt to properly block websites, it usually takes 2-3 for all results to drop out the index, so I don't know how that could explain it either.
-
I agree, the only way I could think this would work would be if the robotx.txt file was on the root domain. I agree, check Webmaster tools, they will tell you under the sitemaps section about "Error: URL was blocked by robots.txt).
One thing to remember is that robots.txt is technically a suggestion to ask search engines not to crawl your site. They can choose to ignore it, though personally I don't know of any cases in which this happenned.
-
An XML sitemap shouldn't override robots.txt. If you have Google Webmaster Tools setup, you will see warnings on the sitemaps page that pages being blocked by robots are being submitted.
Now, robots.txt does not prevent indexation, just crawling. So if the pages were indexed before they implemented robots.txt, they may continue to be indexed. Google will also display just the URL for pages that it's discovered, but can't crawl because of robots.txt.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Handling XML Sitemaps for Ad Classified Sites
Let's put on a scenario for a Job Classified site, So far the way we are handling xml sitemaps is in a consecutive number containing only ads historically: http://site.com/sitemap_ads_1.xml http://site.com/sitemap_ads_2.xml http://site.com/sitemap_ads_99.xml Those sitemaps are constantly updating as each ad is published, keeping expired ads but I'm sure there is a better way to handle them. For instance we have other source of content besides ads pages, like those related to search results (Careers, Location, Salary, level, type of contract, etc) and blog content, but we are not adding them yet So what I'm suggesting is to reduce the amount of xml sitemaps ads to just one, including just the ones that are active (not expired), add another xml sitemap based on search results, another one on blog content, another one on images and finally one for static content such as home, faq, contact, etc. Do you guys think this is the right way to go?
Technical SEO | | JoaoCJ0 -
Are mobile annotation in PC xml sitemaps a replacement for mobile xml sitemaps?
These two links confused me as to what I should do... https://developers.google.com/webmasters/smartphone-sites/details https://support.google.com/webmasters/answer/34648?hl=en
Technical SEO | | JasonOliveira0 -
Redirecting Domain will cause SEO?
I am redirecting my domain, based on Geo Location. For example : if someone browse from india www.example.com they will redirect to www.example.in Does redirecting domain on Geo based will cause SEO or Any kind of duplicate content issue, as i am using same content on both TLDs.
Technical SEO | | anishtapan0 -
Children in this Sitemap index Warnings
Hi, I have just submitted a sitmap for one website. But I am getting this warning: Number of children in this Sitemap index 3
Technical SEO | | knockmyheart
Sitemap contains urls which are blocked by robots.txt.Sitemap: www.zemtube.com/videoscategory-sitemap.xmlValue: http://www.zemtube.com/videoscategory/exclusive/www.zemtube.com/videoscategory-sitemap.xmlValue: http://www.zemtube.com/videoscategory/featured/www.zemtube.com/videoscategory-sitemap.xmlValue: http://www.zemtube.com/videoscategory/other/It is a wordpress website and the robots.txt file is:# Exclude Files From All Robots: User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /tag/ End robots.txt file#I have also tried adding this to the robots.txtSitemap: http://www.zemtube.com/sitemap_index.xmlWebmaster-Tools-Sitemaps-httpwww.zemtube.com_.pdf0 -
Best practice for XML sitemap depth
We run an eCommerce for education products with 20 or so subject based catalogues (Maths, Literacy etc) and each catalogue having numerous ranges (Counting, Maths Games etc) then products within those. We carry approximately 15,000 products. My question is around the sitemap we submit - nightly - and it's depth. It is currently set to cover off home, catalogues and ranges plus all static content (about us etc). Should we be submitting sitemaps to include product pages as well? Does it matter or would it not make much difference in terms of search. Thanks in advance.
Technical SEO | | TTS_Group0 -
Do you get credit for an external link that points to a page that's being blocked by robots.txt
Hi folks, No one, including me seems to actually know what happens!? To repeat: If site A links to /home.html on site B and site B blocks /home.html in Robots.txt, does site B get credit for that link? Does the link pass PageRank? Will Google still crawl through it? Does the domain get some juice, but not the page? I know there's other ways of doing this properly, but it is interesting no?
Technical SEO | | DaveSottimano0 -
How can I exclude display ads from robots.txt?
Google has stated that you can do this to get spiders to content only, and faster. Our IT guy is saying it's impossible.
Technical SEO | | GregBeddor
Do you know how to exlude display ads from robots.txt? Any help would be much appreciated.0 -
Segmenting Website into XML Sitemaps
Hi all, I'm about to begin the process of chopping up a 1,000 page website into separate sitemaps. I'm going for a three tiered approach so that I can check indexation on each level for: Category, Subcategory, Product What's the easiest way to create three separate XML sitemaps for this? Thanks, Nick
Technical SEO | | NickPateman810