Restricted by robots.txt does this cause problems?
-
I have restricted around 1,500 links which are links to retailers website and links that affiliate links accorsing to webmaster tools
Is this the right approach as I thought it would affect the link juice? or should I take the no follow out of the restricted by robots.txt file
-
Hello Ocelot,
I am assuming you have a site that has affiliate links and you want to keep Google from crawling those affiliate links. If I am wrong, please let me know. Going forward with that assumption then...
That is one way to do it. So perhaps you first send all of those links through a redirect via a folder called /out/ or /links/ or whatever, and you have blocked that folder in the robots.txt file. Correct? If so, this is how many affiliate sites handle the situation.
I would not rely on rel nofollow alone, though I would use that in addition to the robots.txt block.
There are many other ways to handle this. For instance, you could make all affilaite links javascript links instead of href links. Then you could put the javascript into a folder called /js/ or something like that, and block that in the robots.txt file. This works less and less now that Google Preview Bot seems to be ignoring the disallow statement in those situations.
You could make it all the same URL with a unique identifyer of some sort that tells your database where to redirect the click. For example:
www.yoursite.com/outlink/mylink#123
or
www.yoursite.com/mylink?link-id=123
In which case you could then block /mylink in the robots.txt file and tell Google to ignore the link-ID parameter via Webmaster Tools.
As you can see, there is more than one way to skin this cat. The problem is always going to be doing it without looking like you're trying to "fool" Google - because they WILL catch up with any tactic like that eventually.
Good luck!
Everett
-
From a coding perspective, applying the nofollow to the links is the best way to go.
With the robots.txt file, only the top tier search engines respect the information contained within, so lesser known bots or spammers might check your robots.txt file to see what you don't want listed, and that info will give them a starting point to look deeper into your site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Tough SEO problem, Google not caching page correctly
My web site is http://www.mercimamanboutique.com/ Cached version of French version is, cache:www.mercimamanboutique.com/fr-fr/ showing incorrectly The German version: cache:www.mercimamanboutique.com/de-de/ is showing correctly. I have resubmitted site links, and asked Google re-index the web site many times. The German version always gets cached properly, but the French version never does. This is frustrating me, any idea why? Thanks.
Technical SEO | | ss20160 -
Query Strings causing Duplicate Content
I am working with a client that has multiple locations across the nation, and they recently merged all of the location sites into one site. To allow the lead capture forms to pre-populate the locations, they are using the query string /?location=cityname on every page. EXAMPLE - www.example.com/product www.example.com/product/?location=nashville www.example.com/product/?location=chicago There are thirty locations across the nation, so, every page x 30 is being flagged as duplicate content... at least in the crawl through MOZ. Does using that query string actually cause a duplicate content problem?
Technical SEO | | Rooted1 -
What would cause a huge decrease in total links?
After the latest crawl of one of my client campaigns in Moz, I noticed that their domain authority dropped by 3 since the last crawl just one week ago. That seems pretty drastic. I took a look around to see what might have been the cause, and in looking at the Links > Competitive Metrics > History section of Moz, I found that the site had a decrease of total links from 5,339 to 1,072 (see image) - also a drastic decrease. My question is, what would cause such a huge, sudden decrease in links over just the course of one month? I haven't changed anything up with the approach to their online strategy, in fact we have put increased emphasis on the creation of natural links so this huge decrease comes as a real surprise. I should also mention that Total External Links also decreased, but only from 243 to 205. Any insight into these numbers and what could have possibly caused such a drastic decrease would be very much appreciated! Screen-Shot-2014-07-23-at-10.00.28-AM.png
Technical SEO | | garrettkite0 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
Duplicate content problem?
Hello! I am not sure if this is a problem or if I am just making something too complicated. Here's the deal. I took on a client who has an existing site in something called homestead. Files cannot be downloaded, making it tricky to get out of homestead. The way it is set up is new sites are developed on subdomains of homestead.com, and then your chosen domain points to this subdomain. The designer who built it has kindly given me access to her account so that I can edit the site, but this is awkward. I want to move the site to its own account. However, to do so Homestead requires that I create a new subdomain and copy the files from one to the other. They don't have any way to redirect the prior subdomain to the new one. They recommend I do something in the html, since that is all I can access. Am I unnecessarily worried about the duplicate content consequences? My understanding is that now I will have two subdomains with the same exact content. True, over time I will be editing the new one. But you get what I'm sayin'. Thanks!
Technical SEO | | devbook90 -
A problem with duplicate content
I'm kind of new at this. My crawl anaylsis says that I have a problem with duplicate content. I set the site up so that web sections appear in a folder with an index page as a landing page for that section. The URL would look like: www.myweb.com/section/index.php The crawl analysis says that both that URL and its root: www.myweb.com/section/ have been indexed. So I appear to have a situation where the page has been indexed twice and is a duplicate of itself. What can I do to remedy this? And, what steps should i take to get the pages re-indexed so that this type of duplication is avoided? I hope this makes sense! Any help gratefully received. Iain
Technical SEO | | iain0 -
Question about construction of our sitemap URL in robots.txt file
Hi all, This is a Webmaster/SEO question. This is the sitemap URL currently in our robots.txt file: http://www.ccisolutions.com/sitemap.xml As you can see it leads to a page with two URLs on it. Is this a problem? Wouldn't it be better to list both of those XML files as separate line items in the robots.txt file? Thanks! Dana
Technical SEO | | danatanseo0 -
RegEx help needed for robots.txt potential conflict
I've created a robots.txt file for a new Magento install and used an existing site-map that was on the Magento help forums but the trouble is I can't decipher something. It seems that I am allowing and disallowing access to the same expression for pagination. My robots.txt file (and a lot of other Magento site-maps it seems) includes both: Allow: /*?p= and Disallow: /?p=& I've searched for help on RegEx and I can't see what "&" does but it seems to me that I'm allowing crawler access to all pagination URLs, but then possibly disallowing access to all pagination URLs that include anything other than just the page number? I've looked at several resources and there is practically no reference to what "&" does... Can anyone shed any light on this, to ensure I am allowing suitable access to a shop? Thanks in advance for any assistance
Technical SEO | | MSTJames0