Can URLs blocked with robots.txt hurt your site?
-
We have about 20 testing environments blocked by robots.txt, and these environments contain duplicates of our indexed content. These environments are all blocked by robots.txt, and appearing in google's index as blocked by robots.txt--can they still count against us or hurt us?
I know the best practice to permanently remove these would be to use the noindex tag, but I'm wondering if we leave them they way they are if they can still hurt us.
-
90% not, first of all, check if google indexed them, if not, your robots.txt should do it, however I would reinforce that by making sure those URLs are our of your sitemap file and make sure your robots's disallows are set to ALL *, not just google for example.
Google's duplicity policies are tough, but they will always respect simple policies such as robots.txt.
I had a case in the past when a customer had a dedicated IP, and google somehow found it, so you could see both the domain's pages and IP's pages, both the same, we simply added a .htaccess rule to point the IP requests to the domain, and even when the situation was like that for long, it doesn't seem to have affected them. In theory google penalizes duplicity but not in this particular cases, it is a matter of behavior.
Regards!
-
I've seen people say that in "rare" cases, links blocked by Robots.txt will be shown as search results but there's no way I can imagine that would happen if it's duplicates of your content.
Robots.txt lets a search engine know not to crawl a directory - but if another resource links to it, they may know it exists, just not the content of it. They won't know if it's noindex or not because they don't crawl it - but if they know it exists, they could rarely return it. Duplicate content would have a better result, therefore that better result will be returned, and your test sites should not be...
As far as hurting your site, no way. Unless a page WAS allowed, is duplicate, is now NOT allowed, and hasn't been recrawled. In that case, I can't imagine it would hurt you that much either. I wouldn't worry about it.
(Also, noindex doesn't matter on these pages. At least to Google. Google will see the noindex first and will not crawl the page. Until they crawl the page it doesn't matter if it has one word or 300 directives, they'll never see it. So noindex really wouldn't help unless a page had already slipped through.)
-
I don't believe they are going to hurt you, it is more of a warning that if you are trying to have these indexed that at the moment they can't be accessed. When you don't want them to be indexed i.e. in this case, I don't believe you are suffering because of it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Mobile site scrolls past content straight to the products. Can this affect our seo?
As our content can be quite long at the top, we introduced js anchor scroll going straight to the products, by passing the banner and the content at the top. Can this have an issue on seo?
Intermediate & Advanced SEO | | JH_OffLimits1 -
Our web site lost ranking on google a couple of years ago. We have done lots of work on it but still can not improve our search ranking. Can anyone give us some advise
A couple of years ago the ranking on our site dropped over night. I believe someone working here at the time purchased links about that time. We have been doing lots of work on the site since then to improve it. We can not get our rankings back up on google searches. Can anyone give us some advise about what to do or where to go for some help that we can trust.
Intermediate & Advanced SEO | | CostumeD0 -
Is there a downside of an image coming from the site's dotted quad and can it be seen as a duplicate?
Ok the question doesn't fully explain the issue. I just want some opinions on this. Here is the backstory. I have a client with a domain that has been around for a while and was doing well but with no backlinks. (Fairly low competition). For some reason they created mirrors of their site on different urls. Then their web designer built them a test site that was a copy of their site on the web designer's url and didn't bother to noindex it. Client's site dived, the web designer's site started ranking for their keywords. So we helped clean that up, and they hired a brand new web designer and redesigned the site. For some reason the dotted quad version of the site started showing up as a referer in GA. So one image on the site comes from that and not the site's url. So I ran a copyscape and site search and discovered the dotted quad version like 69.64.153.116 (not the actual address) was also being indexed by the search engine. To us this seems like a cut and dry duplicate content issue, but I'm having trouble finding much written on the subject. I raised the issue with the dev, and he reluctantly 301 the site to the official url. The second part of this is the web designer still has that one image on the site coming from the numerical version of the site and not the written url. Any thoughts if that has any negative SEO impact? My thought it isn't ideal, but it just looks like an external referral for pulling that one image. I'd love any thoughts or experience on a situation like this.
Intermediate & Advanced SEO | | BCutrer0 -
Optimal site structure for travel site
Hi there, I am seo-managing a travel website where we are going to make a new site structure next year. We have about 4000 pages on the site at the moment. The structure is only 2-levels at the moment: Level 1: Homepage Level 2: All other pages (4000 individual pages - (all with different urls)) We are adding another 2-3 levels, but we have a challenge: We have potentially 2 roads to the same product (e.g. "phuket diving product") domain.com/thailand/activities/diving/phuket-diving-product.asp domain.com/activities/diving/thailand/phuket-diving-product.asp I would very much appreciate your view on the problem: How do I solve this dilemma/challenge from a SEO standpoint? I want to avoid DC if possible, I also only want one landing page - for many reasons. And usability is of course also very important. Best regards, Chris
Intermediate & Advanced SEO | | sembseo0 -
URL structure + process for a large travel site
Hello, I am looking at the URL structure for a travel site that will want to optimise lots of locations to a wide variety of terms, so for example hotels in london
Intermediate & Advanced SEO | | onefinestay
hotels in kensington (which is in london)
five star hotels in kensington
etc I am keen to see if my thought process is correct as you see so many different URL techniques out there. Or am i overthinking it too much? Lets assume we make the page /london/ as our homepage. we would then logically link to /london/hotels to optimise specifically for 'london hotels' We then have two options in my mind for optimising for 'kensington hotels': Link to a page that keeps /london/hotels/ in its URL to maintain consistency ie A. /london/hotels/kensington or should we be linking to: B. /london/kensington/hotels/ (as it allows us to maintain a logical geo-landing page hierarchy) I feel A is good as the URL matches the search phrase 'hotels in kensington' matches the order of the search phrase, but it loses value if any links find these pages with 'kensington' in the anchor text, as they would not really strengthen the 'kensington' hub page. /london/kensington Ie: i land on the 'kensington hotels' page and want to see more about kensington, then i could go from /london/kensington/hotels
to
/london/kensington quite easily and logically in the breadcrumb. I feel B. is the best option for now.. Happy to I am only musing as i see some good sites that use option A, which effectively pushes the location (/kensington/ to the end of the URL for each additional niche sub page, ie /london/hotels/five-star-hotels/kensington/) Some of the bigger travel sites dont even use folder, they just go:
example.com/five-star-hotels-in-kensington/ Comments welcome!!! Thanks0 -
Think I may have found a problem with site. Can you confirm my suspicions?
So I've been wracking my brain about a problem. I had posted earlier about our degrading rank that we haven't been able to arrest. I thought we were doing everything right. Many years ago we had a program that would allow other stores in our niche use our site as a storefront if they couldn't deal with setting up their own site. They would have their own homepage with their own domain but all links from that page would go to our site to avoid duplicate content issues (before I knew about canonical meta tags or before they existed, I don't remember). I just realize that we had dozens of these domains pointing to our site without nofollow meta tags. Is it possible that this pattern looked like we were trying to game Google and have been penalized as some kind of link farm since Panda? I've added nofollow meta tags to these domains. If we were being penalized for this, should this fix the problem?
Intermediate & Advanced SEO | | IanTheScot0 -
Steps you can take to ensure your content is indexed and registered to your site before a scraper gets to it?
Hi, A clients site has significant amounts of original content that has blatantly been copied and pasted in various other competitor and article sites. I'm working with the client to rejig lots of this content and to publish new content. What steps would you recommend to undertake when the new, updated site is launched to ensure Google clearly attributes the content to the clients site first? One thing I will be doing is submitting a new xml + html sitemap. Thankyou
Intermediate & Advanced SEO | | Qasim_IMG0 -
Need help identifying why my content rich site was hurt by the Panda Update.
Hi - I run a hobby related niche new / article / resource site (http://tinyurl.com/4eavaj4) which has been heavily impacted by the Panda update. Honestly I have no idea why my Google rankings dropped off. I've hired 2 different SEO experts to look into it and no one has been able to figure it out. My link profile is totally white hat and stronger then the majority of my competitors, I have 4000+ or so pages of unique, high quality content, am a Google News source, and publish about 5 new unique articles every day. I ended up deleting a 100 or so thin video pages on my site, did some url reorganization (using 301s), and fixed all the broken links. That appeared to be helping as my traffic was returning to normal. Then the bottom dropped out again. Since Saturday my daily traffic has dropped by 50%. I am really baffled at this point as to what to do so any help would be sincerely appreciated. Thanks, Mike jamescrayton2003@yahoo.com
Intermediate & Advanced SEO | | MikeATL0