Can URLs blocked with robots.txt hurt your site?
-
We have about 20 testing environments blocked by robots.txt, and these environments contain duplicates of our indexed content. These environments are all blocked by robots.txt, and appearing in google's index as blocked by robots.txt--can they still count against us or hurt us?
I know the best practice to permanently remove these would be to use the noindex tag, but I'm wondering if we leave them they way they are if they can still hurt us.
-
90% not, first of all, check if google indexed them, if not, your robots.txt should do it, however I would reinforce that by making sure those URLs are our of your sitemap file and make sure your robots's disallows are set to ALL *, not just google for example.
Google's duplicity policies are tough, but they will always respect simple policies such as robots.txt.
I had a case in the past when a customer had a dedicated IP, and google somehow found it, so you could see both the domain's pages and IP's pages, both the same, we simply added a .htaccess rule to point the IP requests to the domain, and even when the situation was like that for long, it doesn't seem to have affected them. In theory google penalizes duplicity but not in this particular cases, it is a matter of behavior.
Regards!
-
I've seen people say that in "rare" cases, links blocked by Robots.txt will be shown as search results but there's no way I can imagine that would happen if it's duplicates of your content.
Robots.txt lets a search engine know not to crawl a directory - but if another resource links to it, they may know it exists, just not the content of it. They won't know if it's noindex or not because they don't crawl it - but if they know it exists, they could rarely return it. Duplicate content would have a better result, therefore that better result will be returned, and your test sites should not be...
As far as hurting your site, no way. Unless a page WAS allowed, is duplicate, is now NOT allowed, and hasn't been recrawled. In that case, I can't imagine it would hurt you that much either. I wouldn't worry about it.
(Also, noindex doesn't matter on these pages. At least to Google. Google will see the noindex first and will not crawl the page. Until they crawl the page it doesn't matter if it has one word or 300 directives, they'll never see it. So noindex really wouldn't help unless a page had already slipped through.)
-
I don't believe they are going to hurt you, it is more of a warning that if you are trying to have these indexed that at the moment they can't be accessed. When you don't want them to be indexed i.e. in this case, I don't believe you are suffering because of it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why a certain URL ( a category URL ) disappears?
the page hasn't been spammed. - links are natural - onpage grader is perfect - there are useful high ranking articles linking to the page...pretty much everything is okay.....also all of my websites pages are okay and none of them has disappeared only this one ( the most important category of my site. )
Intermediate & Advanced SEO | | mohamadalieskandariii0 -
Is Link equity / Link Juice lost to a blocked URL in the same way that it is lost to nofollow link
Hi If there is a link on a page that goes to a URL that is blocked in robots txt - is the link juice lost in the same way as when you add nofollow to a link on a page. Any help would be most appreciated.
Intermediate & Advanced SEO | | Andrew-SEO0 -
Can I markup additional information about a person that is not visible on my site?
Hello, I want to mark up my site and I don't know if I can markup up 'CEO' for a person if it doesn't say 'CEO' on my site. In general, does everything that I markup have to be visible on my site? Thanks for taking the time to answer my question. Leebi
Intermediate & Advanced SEO | | Leebi0 -
Is robots met tag a more reliable than robots.txt at preventing indexing by Google?
What's your experience of using robots meta tag v robots.txt when it comes to a stand alone solution to prevent Google indexing? I am pretty sure robots meta tag is more reliable - going on own experiences, I have never experience any probs with robots meta tags but plenty with robots.txt as a stand alone solution. Thanks in advance, Luke
Intermediate & Advanced SEO | | McTaggart1 -
SEO impact when micro site are hosted on third party url
Have a look at this url
Intermediate & Advanced SEO | | kjerstibakke
http://www.bekkevold-as.no/.
On the front page they are providing links to a website created by one of their suppliers. (Three cirkle icons in the middle of the page).
The links make the customers leave the site and open on a different domain, even the branding of the page is for Bekkevold Elektro (see log top right)
The question from me: How will this impact the SEO for Bekkevold Elektro. Would it be better to create a subdomain for Bekkevold Elektro and ask the supplier to point to this? Or is it ok to leave it as it is, having links back to the web site for Bekkevold Elektro? Just changing the target of the link to open in a new window.
Thank you very much for your support on this. I would appreciate any suggestion for improvement for my customer, Bekkevold Elektro. Best regards
Kjersti Bakke0 -
URLs with parameters + canonicals + meta robots
Hi Moz community! I'm posting a new question here as I couldn't find specific answer to the case I'm facing. Along with canonical tags, we are implementing meta robots on our pages (e-commerce website with thousands of pages). Most of the cases have been covered but I still have one unanswered case: our products are linked from list pages (mostly categories) but they almost always include a tracking parameter (ie /my-product.html?ref=xxx) products urls are secured with a canonical tag (referring only to the clean url /my-product.html) but what would be the best solution regarding the meta robots? For now we opted for a meta robot 'noindex, follow' for non canonical urls (so the ones unfortunately linked from our category/list pages), but I'm afraid that it could hurt our SEO (apparently no juice is given from URLs with a noindex robots), and even maybe prevent bots from crawling our website properly ... Would it be best to have no meta robots at all on these product urls with parameters? (we obviously can't have 'index, follow' when the canonical ref points to another url!). Thanks for your help!
Intermediate & Advanced SEO | | JessicaZylberberg0 -
Robots.txt: Can you put a /* wildcard in the middle of a URL?
We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt. For example: Disallow: /images/ is blocked just fine However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed. The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds?
Intermediate & Advanced SEO | | IHSwebsite0 -
Questions about turning my wordpress site into an ecommerce site. Experience needed.
I have a wordpress site that is about a product that is now getting some great traffic. Right now It has affiliate stuff on it. I want to sell my own product so I will be turning this wordpress site into an ecommerce site. I want to redesign it so I am not looking for simple plugins to just add a cart. The part I am really confused about is what to do with my posts and categories? How does that work when turning this site into an ecommerce site? Lets say the site is "hats for adults" My post pages are things like "funny hats for adults", "hats for adult men" etc etc. Would I turn these posts pages into like category pages that have a category of products. Or should I create real categories and have my developer turn those into the ecommerce category pages and then redirect my posts to those categories? Maybe I don't even know what I am talking about. Is this even making sense? This is a small site (5posts and 1 category) and most of the traffic will come from the homepage keywords anyways.
Intermediate & Advanced SEO | | PEnterprises0