Robots.txt and redirected backlinks
-
Hey there,
since a client's global website has a very complex structure which lead to big duplicate content problems, we decided to disallow crawler access and instead allow access to only a few relevant subdirectories. While indexing has improved since this I was wondering if we might have cut off link juice. Since several backlinks point to the disallowed root directory and are from there redirected (301) to the allowed directory I was wondering if this could cause any problems?
Example: If there is a backlink pointing to example.com (disallowed in robots.txt) and is redirected from there to example.com/uk/en (allowed in robots.txt). Would this cut off the link juice?
Thanks a lot for your thoughts on this.
Regards,
Jochen
-
A noindexed page can still accumulate and pass link equity, although results vary on whether or not some of that link juice "evaporates" along the way. I'm inclined to agree with Chris, though, that there's probably no need to noindex a page that redirects to a page that you do want indexed.
-
Hi Jochen,
It's an interesting situation and to be honest, I don't know for sure how search engines will deal with that "link juice". This will come down to a question of whether search engines see robots.txt or htaccess first. If it looks at robots first (which is my suspicion), it can't see that page to pass the strength.
I suppose to test this, you could submit the redirected page to index via Search Console and see if it shows you the redirect or says it's blocked.
Interesting question aside, there's no real need to block access to a 301'd page
Also, apologies if I'm just highlighting the obvious here but it would be far better to clean up the site structure and remove that duplication rather than just masking it with robots; the user experience is at least as important as the algorithms!
Along the same lines, cleaning up those pages is going to help your crawl budget immensely.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt - Googlebot - Allow... what's it for?
Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke User-Agent: Googlebot Allow: /.js Allow: /.css
Intermediate & Advanced SEO | | McTaggart0 -
Fetch as Google - Redirected
Hi I have swaped from HTTP to HTTPS and put a redirect on for HTTP to redirect to HTTPS. I also put www.xyz.co.uk/index.html to redirect to www.xyz.co.uk When I fetch as Google it shows up redirect! Does this mean that I have too many 301 looping? Do I need the redirect on index.html to root domain if I have a rel conanical in place for index.html htaccess (Linix) - RewriteCond %{HTTP_HOST} ^xyz.co.uk
Intermediate & Advanced SEO | | Cocoonfxmedia
RewriteRule (.*) https://www.xyz.co.uk/$1 [R=301,L] RewriteRule ^$ index.html [R=301,L]0 -
Should I disallow via robots.txt for my sub folder country TLD's?
Hello, My website is in default English and Spanish as a sub folder TLD. Because of my Joomla platform, Google is listing hundreds of soft 404 links of French, Chinese, German etc. sub TLD's. Again, i never created these country sub folder url's, but Google is crawling them. Is it best to just "Disallow" these sub folder TLD's like the example below, then "mark as fixed" in my crawl errors section in Google Webmaster tools?: User-agent: * Disallow: /de/ Disallow: /fr/ Disallow: /cn/ Thank you, Shawn
Intermediate & Advanced SEO | | Shawn1240 -
Redirection not working
http://elmanarah.com/ to http://www.elmanarah.com/ I have mistakenly created 5 databases for one wordpress installation.In order to get rid of them I mistakenly even deleted the right one.Now created the new one but the URL is showing with www Even now if It type in http://elmanarah.com/ it sends me to http://www.elmanarah.com/ I also check in URL D.A and P.A in OSE it shows like I have redirected it fine.Can anyone Check in and guide me either I have done it right and It pass on my previous work effort or it was total loss for me?
Intermediate & Advanced SEO | | csfarnsworth0 -
Redirection - Seo trick?
Hi, After analyzing the site I found several Redirections of exact match domains. With different domain name extensions. Is Seo trick? Is the second website which i fond that is using this technique. Can anyone gives more details? Thanks
Intermediate & Advanced SEO | | nyanainc0 -
Redirection to mobile site
Calling all SEO ninjas! I'm currently developing single web pages for various clients which function as abbreviated versions of their main websites. They are all related & under a single domain. When a user visits these pages on a mobile device, CSS is used to display mobile friendly versions of these pages. My clients are thrilled with these mobile versions and now want to also redirect mobile visitors from their main site (which is not mobile optimised) to these pages. My questions are: Are there any negative implications if we did this? ie. redirecting to a different domain What is the best method for redirection? eg. JavaScript Can this be achieved by adding a single line of code to their main site Can this be done in an SEO friendly way so that the redirection acts like a backlink? Many thanks.
Intermediate & Advanced SEO | | martyc0 -
Robots.txt & url removal vs. noindex, follow?
When de-indexing pages from google, what are the pros & cons of each of the below two options: robots.txt & requesting url removal from google webmasters Use the noindex, follow meta tag on all doctor profile pages Keep the URLs in the Sitemap file so that Google will recrawl them and find the noindex meta tag make sure that they're not disallowed by the robots.txt file
Intermediate & Advanced SEO | | nicole.healthline0 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1