Blocking Affiliate Links via robots.txt
-
Hi,
I work with a client who has a large affiliate network pointing to their domain which is a large part of their inbound marketing strategy. All of these links point to a subdomain of affiliates.example.com, which then redirects the links through a 301 redirect to the relevant target page for the link. These links have been showing up in Webmaster Tools as top linking domains and also in the latest downloaded links reports. To follow guidelines and ensure that these links aren't counted by Google for either positive or negative impact on the site, we have added a block on the robots.txt of the affiliates.example.com subdomain, blocking search engines from crawling the full subddomain. The robots.txt file is the following code:
User-agent: *
Disallow: /
We have authenticated the subdomain with Google Webmaster Tools and made certain that Google can reach and read the robots.txt file. We know they are being blocked from reading the affiliates subdomain. However, we added this affiliates subdomain block a few weeks ago to the robots.txt, but links are still showing up in the latest downloads report as first being discovered after we added the block. It's been a few weeks already, and we want to make sure that the block was implemented properly and that these links aren't being used to negatively impact the site. Any suggestions or clarification would be helpful - if the subdomain is being blocked for the search engines, why are the search engines following the links and reporting them in the www.example.com subdomain GWMT account as latest links. And if the block is implemented properly, will the total number of links pointing to our site as reported in the links to your site section be reduced, or does this not have an impact on that figure?From a development standpoint, it's a much easier fix for us to adjust the robots.txt file than to change the affiliate linking connection from a 301 to a 302, which is why we decided to go with this option.Any help you can offer will be greatly appreciated.Thanks,Mark
-
I think you did the right thing. Engines will take a while until they re-crawl your robots.txt and actually following what you commanded.
Extra steps I would take:
- 302 the redirect, probably is just a line of code doing the redirect after setting some cookies or session variables.
- Try to edit the affiliate codes to work with Javascript instead of naked URLs ( could be something like <ins class="affiliate">that is later switched to a text link or banner using JS). This will not only allow you to set a nofollow for those links, but you could be able to remove/block specific affiliates or pages where you don't want your links/banners.</ins>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website URL, Robots.txt and Google Search Console (www. vs non www.)
Hi MOZ Community,
Technical SEO | | Badiuzz
I would like to request your kind assistance on domain URLs - www. VS non www. Recently, my team have moved to a new website where a 301 Redirection has been done. Original URL : https://www.example.com.my/ (with www.) New URL : https://example.com.my/ (without www.) Our current robots.txt sitemap : https://www.example.com.my/sitemap.xml (with www.)
Our Google Search Console property : https://www.example.com.my/ (with www.) Question:
1. How/Should I standardize these so that Google crawler can effectively crawl my website?
2. Do I have to change back my website URLs to (with www.) or I just need to update my robots.txt?
3. How can I update my Google Search Console property to reflect accordingly (without www.), because I cannot see the options in the dashboard.
4. Is there any to dos such as Canonicalization needed, or should I wait for Google to automatically detect and change it, especially in GSC property? Really appreciate your kind assistance. Thank you,
Badiuzz0 -
Should I block Map pages with robots.txt?
Hello, I have a website that was started in 1999. On the website I have map pages for each of the offices listed on my site, for which there are about 120. Each of the 120 maps is in a whole separate html page. There is no content in the page other than the map. I know all of the offices love having the map pages so I don't want to remove the pages. So, my question is would these pages with no real content be hurting the rankings of the other pages on our site? Therefore, should I block the pages with my robots.txt? Would I also have to remove these pages (in webmaster tools?) from Google for blocking by robots.txt to really work? I appreciate your feedback, thanks!
Technical SEO | | imaginex0 -
Outbound Links
I have a page on upstrap-pro.com that provides weights of cameras and lenses. The user/buyer of my on-slip camera straps needs to know the weight his camera and lens to determine the proper pad size... large to small. We have put together a long list of the most popular customer cameras. The way it was done (by my daughter) was to also provide a via a link to dpreview.com which is an excellent site for camera information including specifications etc. My personal feeling about this is mixed. I can do it by having it open dpreview.com in a new tab but then the user/customer could still get distracted and go down the rabbit hole. On the other hand dpreview is such a good site that if they are new to photography and don't know about it, they should. I don't get a dime from dpreview. If fact I doubt they would ever link back to me because they do not write about camera straps. I hear mixed things about outbound links. In this file there are quite a few outbound links to dpreview to keep it consistent. I could do a nofollow on all of them but I read that this is the easy way out. Google is jump ball and I have no clue what Cutts and his merry men are going to decide is cool or not cool. I'd like some thoughts or options... Thanks... A small part of the file below. Canon EF 14mm f/2.8L II USM Wideangle prime lens Canon EF 22.8 oz 645 g Canon EF 14mm f/2.8L USM
Technical SEO | | Asteg0 -
Robots.txt anomaly
Hi, I'm monitoring a site thats had a new design relaunch and new robots.txt added. Over the period of a week (since launch) webmaster tools has shown a steadily increasing number of blocked urls (now at 14). In the robots.txt file though theres only 12 lines with the disallow command, could this be occurring because a line in the command could refer to more than one page/url ? They all look like single urls for example: Disallow: /wp-content/plugins
Technical SEO | | Dan-Lawrence
Disallow: /wp-content/cache
Disallow: /wp-content/themes etc, etc And is it normal for webmaster tools reporting of robots.txt blocked urls to steadily increase in number over time, as opposed to being identified straight away ? Thanks in advance for any help/advice/clarity why this may be happening ? Cheers Dan0 -
My seo company has a footer link that links to my site by keyword will this effect my rankings
My old SEo company has a footer link by keyword to my site so it acts like a site wide link will this effect my rankings. My site was in the top 5 for many keywords now page 2 and 3 so I am trying to see what has effected it as we havent changed what we do
Technical SEO | | Casefun0 -
While SEOMoz currently can tell us the number of linking c-blocks, can SEOMoz tell us what the specific c-blocks are?
I know it is important to have a diverse set of c-blocks, but I don't know how it is possible to have a diverse set if I can't find out what the c-blocks are in the first place. Also, is there a standard for domain linking c-blocks? For instance, I'm not sure if a certain amount is considered "average" or "above-average."
Technical SEO | | Todd_Kendrick0 -
Robots.txt versus sitemap
Hi everyone, Lets say we have a robots.txt that disallows specific folders on our website, but a sitemap submitted in Google Webmaster Tools that lists content in those folders. Who wins? Will the sitemap content get indexed even if it's blocked by robots.txt? I know content that is blocked by robot.txt can still get indexed and display a URL if Google discovers it via a link so I'm wondering if that would happen in this scenario too. Thanks!
Technical SEO | | anthematic0 -
Confused about robots.txt
There is a lot of conflicting and/or unclear information about robots.txt out there. Somehow, I can't make out what's the best way to use robots even after visiting the official robots website. For example I have the following format for my robots. User-agent: * Disallow: javascript.js Disallow: /images/ Disallow: /embedconfig Disallow: /playerconfig Disallow: /spotlightmedia Disallow: /EventVideos Disallow: /playEpisode Allow: / Sitemap: http://www.example.tv/sitemapindex.xml Sitemap: http://www.example.tv/sitemapindex-videos.xml Sitemap: http://www.example.tv/news-sitemap.xml Is this correct and/or recommended? If so, then how come I see a list of over 200 or so links blocked by robots when Im checking out Google Webmaster Tools! Help someone, anyone! Can't seem to understand this robotic business! Regards,
Technical SEO | | Netpace0