Can URLs blocked with robots.txt hurt your site?
-
We have about 20 testing environments blocked by robots.txt, and these environments contain duplicates of our indexed content. These environments are all blocked by robots.txt, and appearing in google's index as blocked by robots.txt--can they still count against us or hurt us?
I know the best practice to permanently remove these would be to use the noindex tag, but I'm wondering if we leave them they way they are if they can still hurt us.
-
90% not, first of all, check if google indexed them, if not, your robots.txt should do it, however I would reinforce that by making sure those URLs are our of your sitemap file and make sure your robots's disallows are set to ALL *, not just google for example.
Google's duplicity policies are tough, but they will always respect simple policies such as robots.txt.
I had a case in the past when a customer had a dedicated IP, and google somehow found it, so you could see both the domain's pages and IP's pages, both the same, we simply added a .htaccess rule to point the IP requests to the domain, and even when the situation was like that for long, it doesn't seem to have affected them. In theory google penalizes duplicity but not in this particular cases, it is a matter of behavior.
Regards!
-
I've seen people say that in "rare" cases, links blocked by Robots.txt will be shown as search results but there's no way I can imagine that would happen if it's duplicates of your content.
Robots.txt lets a search engine know not to crawl a directory - but if another resource links to it, they may know it exists, just not the content of it. They won't know if it's noindex or not because they don't crawl it - but if they know it exists, they could rarely return it. Duplicate content would have a better result, therefore that better result will be returned, and your test sites should not be...
As far as hurting your site, no way. Unless a page WAS allowed, is duplicate, is now NOT allowed, and hasn't been recrawled. In that case, I can't imagine it would hurt you that much either. I wouldn't worry about it.
(Also, noindex doesn't matter on these pages. At least to Google. Google will see the noindex first and will not crawl the page. Until they crawl the page it doesn't matter if it has one word or 300 directives, they'll never see it. So noindex really wouldn't help unless a page had already slipped through.)
-
I don't believe they are going to hurt you, it is more of a warning that if you are trying to have these indexed that at the moment they can't be accessed. When you don't want them to be indexed i.e. in this case, I don't believe you are suffering because of it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can a site rank high without any visible SEO tactics?
Hi SEO specialists, I don't know if this is the right place to ask this, but there is something bugging me for awhile now, so I thought let's give it a try. I work for several clients and one of them works in the Psychic business. Since a few months we see that site from Psychicvop.com is taking over all kinds of top rankings in Google. The strange thing is that I can't detect anything that I always learned which is basic material for SEO. There isn't much content and there link profile isn't really spectacular either. Does any of you have an idea what it is about this sites that makes them outrank every other site in the business?
Intermediate & Advanced SEO | | LiveFootballTickets0 -
If my website do not have a robot.txt file, does it hurt my website ranking?
After a site audit, I find out that my website don't have a robot.txt. Does it hurt my website rankings? One more thing, when I type mywebsite.com/robot.txt, it automatically redirect to the homepage. Please help!
Intermediate & Advanced SEO | | binhlai0 -
Best practice for disallowing URLS with Robots.txt
Hi Everybody, We are currently trying to tidy up the crawling errors which are appearing when we crawl the site. On first viewing, we were very worried to say the least:17000+. But after looking closer at the report, we found the majority of these errors were being caused by bad URLs featuring: Currency - For example: "directory/currency/switch/currency/GBP/uenc/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL3dvcmt3ZWFyP3ByaWNlPTUwLSZzdGFuZGFyZHM9NzEx/" Color - For example: ?color=91 Price - For example: "?price=650-700" Order - For example: ?dir=desc&order=most_popular Page - For example: "?p=1&standards=704" Login - For example: "customer/account/login/referer/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL2NhdGFsb2cvcHJvZHVjdC92aWV3L2lkLzQ1ODczLyNyZXZpZXctZm9ybQ,,/" My question now is as a novice of working with Robots.txt, what would be the best practice for disallowing URLs featuring these from being crawled? Any advice would be appreciated!
Intermediate & Advanced SEO | | centurysafety0 -
Need help with Robots.txt
An eCommerce site built with Modx CMS. I found lots of auto generated duplicate page issue on that site. Now I need to disallow some pages from that category. Here is the actual product page url looks like
Intermediate & Advanced SEO | | Nahid
product_listing.php?cat=6857 And here is the auto generated url structure
product_listing.php?cat=6857&cPath=dropship&size=19 Can any one suggest how to disallow this specific category through robots.txt. I am not so familiar with Modx and this kind of link structure. Your help will be appreciated. Thanks1 -
Blocking Certain Site Parameters from Google's Index - Please Help
Hello, So we recently used Google Webmaster Tools in an attempt to block certain parameters on our site from showing up in Google's index. One of our site parameters is essentially for user location and accounts for over 500,000 URLs. This parameter does not change page content in any way, and there is no need for Google to index it. We edited the parameter in GWT to tell Google that it does not change site content and to not index it. However, after two weeks, all of these URLs are still definitely getting indexed. Why? Maybe there's something we're missing here. Perhaps there is another way to do this more effectively. Has anyone else ran into this problem? The path we used to implement this action:
Intermediate & Advanced SEO | | Jbake
Google Webmaster Tools > Crawl > URL Parameters Thank you in advance for your help!0 -
Is there a downside of an image coming from the site's dotted quad and can it be seen as a duplicate?
Ok the question doesn't fully explain the issue. I just want some opinions on this. Here is the backstory. I have a client with a domain that has been around for a while and was doing well but with no backlinks. (Fairly low competition). For some reason they created mirrors of their site on different urls. Then their web designer built them a test site that was a copy of their site on the web designer's url and didn't bother to noindex it. Client's site dived, the web designer's site started ranking for their keywords. So we helped clean that up, and they hired a brand new web designer and redesigned the site. For some reason the dotted quad version of the site started showing up as a referer in GA. So one image on the site comes from that and not the site's url. So I ran a copyscape and site search and discovered the dotted quad version like 69.64.153.116 (not the actual address) was also being indexed by the search engine. To us this seems like a cut and dry duplicate content issue, but I'm having trouble finding much written on the subject. I raised the issue with the dev, and he reluctantly 301 the site to the official url. The second part of this is the web designer still has that one image on the site coming from the numerical version of the site and not the written url. Any thoughts if that has any negative SEO impact? My thought it isn't ideal, but it just looks like an external referral for pulling that one image. I'd love any thoughts or experience on a situation like this.
Intermediate & Advanced SEO | | BCutrer0 -
What happens when I redirect an entire site to an established page on another site?
Hi There, I have a website which is dedicated to selling ONE product (in different forms) or my main brand site. It is branded similarly, targets similar keywords, and gets some traffic which convert to leads. Additionally, the auxiliary site has a Google Rank 2 in its own right. I am thinking of consolidating this "auxillary" site to the specific product page on my main site. The reason I am considering doing this is to give a "boost" to the main product page on our main site which has many core keywords sitting with SERP ranking of between 11-20 (so not in first 10) Because this auxiliary site it gets traffic and leads in its own right, I don't want this to be to the detriment of my leads overall. Question is - if I 301 redirect the entire domain from my auxillary site to the equivalent product on my main site am I likely to see a large "boost" to that product page? (i.e. will I likely see my ranking rise from 11 - 20 significantly)
Intermediate & Advanced SEO | | love-seo-goodness0 -
Why is noindex more effective than robots.txt?
In this post, http://www.seomoz.org/blog/restricting-robot-access-for-improved-seo, it mentions that the noindex tag is more effective than using robots.txt for keeping URLs out of the index. Why is this?
Intermediate & Advanced SEO | | nicole.healthline0