What are the negative implications of listing URLs in a sitemap that are then blocked in the robots.txt?
-
In running a crawl of a client's site I can see several URLs listed in the sitemap that are then blocked in the robots.txt file.
Other than perhaps using up crawl budget, are there any other negative implications?
-
I highly doubt it would effect rankings due to low quality issues but it will show that you have site map error warnings in your GWT console. That issue is technically classified as 'Warnings' and not 'Errors'. The right thing to do in that scenario is take the robots.txt block off and just use a 'noindex' tag on the pages. That way they can stay in the site map but they won't show up in the index. Otherwise you should remove them from the sitemap if you don't want the warnings in GWT.
-
I personally do not think there is any penalty SEO wise in doing it. Although, I do think it will mess up the metric in GWT that shows how many pages have been submitted and how many have been indexed. I find that metric useful, so it would make it no longer useful if there are a lot of pages blocked by the robots.txt.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL is invalid: Why?
Hello everyone, I am currently listing my company on business directories. For some websites however when I add my website URL, it comes up as URL is invalid. What could be the reason for this? I have tried different variations like www., http:// and https://. Kind Regards,
Technical SEO | | SMCCoachHire
Aqib0 -
Changing URLs
As of right now we are using yahoo small business, when creating a product you have to declare an id, when we created the site we were not aware that you will not be able to change the id but also the ID is being used as the URL. we have a couple thousand products in which we will need to update the URLs. What would the best way to be to fix this without losing much juice from our current pages. Also I was thinking that if we did them all in a couple weeks it would hurt us a lot, and the best course of action would be to do a slow roll out of the URL changes. Any help is appreciated. Thank you!
Technical SEO | | TITOJAX0 -
Is there any value in having a blank robots.txt file?
I've read an audit where the writer recommended creating and uploading a blank robots.txt file, there was no current file in place. Is there any merit in having a blank robots.txt file? What is the minimum you would include in a basic robots.txt file?
Technical SEO | | NicDale0 -
Robots.txt crawling URL's we dont want it to
Hello We run a number of websites and underneath them we have testing websites (sub-domains), on those sites we have robots.txt disallowing everything. When I logged into MOZ this morning I could see the MOZ spider had crawled our test sites even though we have said not to. Does anyone have an ideas how we can stop this happening?
Technical SEO | | ShearingsGroup0 -
What can I do if Google Webmaster Tools doesn't recognize the robots.txt file?
I'm working on a recently hacked site for a client and and in trying to identify how exactly the hack is running I need to use the fetch as Google bot feature in GWT. I'd love to use this but it thinks the robots.txt is blocking it's acces but the only thing in the robots.txt file is a link to the sitemap. Unde the Blocked URLs section of the GWT it shows that the robots.txt was last downloaded yesterday but it's incorrect information. Is there a way to force Google to look again?
Technical SEO | | DotCar0 -
Block url with dynamic text in
I've just ran a report and I have a lot of duplicate page titles, most of which seem to be the review page, I use Magento and my normal url would be something like blah-blahtext.html but the review url is something like blah-blahtext/reviews/category/categoryname So I want to block the /reviews url bit as no one ever leaves reviews and it's not something I will be using in the future. Also I have a dynamic navigation which creates urls that look like product-name.html?size=2&colour=14 these are also creating duplicate urls, anyway to fix this? While I'm asking, anyone any tips for Magento?
Technical SEO | | Beermonster0 -
Dynamic Parameters in URL
I have received lots of warnings because of long urls. Most of them are because my website has many Attributes to FILTER out products. And each time the user clicks on one, its added to the URL. pls see my site here: www.theprinterdepo.com The warning is here: Although search engines can crawl dynamic URLs, search engine representatives have warned against using over 2 parameters in any given URL. The question to the community is: -What should I do? These attributes really help the user to find easier the products. I could remove some of the attributes, I am not sure if my ecommerce solution (MAGENTO), allows to change the behavior of this so that this does not use querystring parameters.
Technical SEO | | levalencia10 -
Robots.txt
My campaign hse24 (www.hse24.de) is not being crawled any more ... Do you think this can be a problem of the robots.txt? I always thought that Google and friends are interpretating the file correct, seen that he site was crawled since last week. Thanks a lot Bernd NB: Here is the robots.txt: User-Agent: * Disallow: / User-agent: Googlebot User-agent: Googlebot-Image User-agent: Googlebot-Mobile User-agent: MSNBot User-agent: Slurp User-agent: yahoo-mmcrawler User-agent: psbot Disallow: /is-bin/ Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-DE-Site/de_DE/-/EUR/hse24_Storefront-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-AT-Site/de_DE/-/EUR/hse24_Storefront-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-CH-Site/de_DE/-/CHF/hse24_Storefront-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-DE-Site/de_DE/-/EUR/hse24_DisplayProductInformation-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-AT-Site/de_DE/-/EUR/hse24_DisplayProductInformation-Start Allow: /is-bin/INTERSHOP.enfinity/WFS/HSE24-CH-Site/de_DE/-/CHF/hse24_DisplayProductInformation-Start Allow: /is-bin/intershop.static/WFS/HSE24-Site/-/Editions/ Allow: /is-bin/intershop.static/WFS/HSE24-Site/-/Editions/Root%20Edition/units/HSE24/Beratung/
Technical SEO | | remino630