Robots exclusion
-
Hi All,
I have an issue whereby print versions of my articles are being flagged up as "duplicate" content / page titles.
In order to get around this, I feel that the easiest way is to just add them to my robots.txt document with a disallow. Here is my URL make up:
Normal article: www.mysite.com/displayarticle=12345
Print version of my article www.mysite.com/displayarticle=12345&printversion=yes
I know that having dynamic parameters in my URL is not best practise to say the least, but I'm stuck with this for the time being... My question is, how do I add just the print versions of articles to my robots file without disallowing articles too? Can I just add the parameter to the document like so?
Disallow: &printversion=yes
I also know that I can do add a meta noindex, nofollow tag into the head of my print versions, but I feel a robots.txt disallow will be somewhat easier...
Many thanks in advance.
Matt
-
Hi Matt,
I would agree 100% with Ryan's comments on robots.txt.
In addition to this, it is a notoriously unreliable method for blocking...if the crawler happens to come to your site via an external link to any page other than the home page it will not see robots.txt.
Sha.
-
I also know that I can do add a meta noindex, nofollow tag into the head of my print versions, but I feel a robots.txt disallow will be somewhat easier...
A simple rule of SEO. Never ever ever use your robots.txt file to block a page unless there is no other reasonable means of blocking the page.
Use the "noindex" meta tag on your print pages. Do not use the "noindex, nofollow" tag as you are then telling search engines not to trust links to your own site which is a bad move.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
X-robots tag causing no index issues
I have an interesting problem with a site which has an x-robot tag blocking the site from being indexed, the site is in Wordpress, there are no issues with the robots.txt or at the page level, I cant find the noindex anywhere. I removed the SEO plug-in which was there and installed Yoast but it made no difference. this is the url: https://www.cotswoldflatroofing.com/ Its coming up with a HTTP error: x-robots tag noindex, nofollow, noarchive
Technical SEO | | Donsimong0 -
'External nofollow' in a robots meta tag? (advertorial links)
I believe this has never worked? It'd be an easy way of preventing any penalties from Google's recent crackdown on paid links via advertorials. When it's not possible to nofollow each external link individually, what are people doing? Nofollowing and/or noindexing the whole page?
Technical SEO | | Alex-Harford0 -
Should search pages be disallowed in robots.txt?
The SEOmoz crawler picks up "search" pages on a site as having duplicate page titles, which of course they do. Does that mean I should put a "Disallow: /search" tag in my robots.txt? When I put the URL's into Google, they aren't coming up in any SERPS, so I would assume everything's ok. I try to abide by the SEOmoz crawl errors as much as possible, that's why I'm asking. Any thoughts would be helpful. Thanks!
Technical SEO | | MichaelWeisbaum0 -
How can I exclude display ads from robots.txt?
Google has stated that you can do this to get spiders to content only, and faster. Our IT guy is saying it's impossible.
Technical SEO | | GregBeddor
Do you know how to exlude display ads from robots.txt? Any help would be much appreciated.0 -
Restricted by robots.txt and soft bounce issues (related).
In our web master tools we have 35K (ish) URLs that are restricted by robots.txt and as have 1200(ish) soft 404s. WE can't seem to figure out how to properly resolve these URLs so that they no longer show up this way. Our traffic from SEO has taken a major hit over the last 2 weeks because of this. Any help? Thanks, Libby
Technical SEO | | GristMarketing0 -
Is robots.txt a must-have for 150 page well-structured site?
By looking in my logs I see dozens of 404 errors each day from different bots trying to load robots.txt. I have a small site (150 pages) with clean navigation that allows the bots to index the whole site (which they are doing). There are no secret areas I don't want the bots to find (the secret areas are behind a Login so the bots won't see them). I have used rel=nofollow for internal links that point to my Login page. Is there any reason to include a generic robots.txt file that contains "user-agent: *"? I have a minor reason: to stop getting 404 errors and clean up my error logs so I can find other issues that may exist. But I'm wondering if not having a robots.txt file is the same as some default blank file (or 1-line file giving all bots all access)?
Technical SEO | | scanlin0 -
Robots.txt File Redirects to Home Page
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/
Technical SEO | | kchandler0