Could you use a robots.txt file to disalow a duplicate content page from being crawled?
-
A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO.
I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?
-
Yeah, sorry for the confusion. I put the tag on all the pages (Original and Duplicate). I sent you a PM with another good article on Rel canonical tag
-
Peter, Thanks for the clarification.
-
Generally agree, although I'd just add that Robots.txt also isn't so great at removing content that's already been indexed (it's better at prevention). So, I find that it's not just not ideal - it sometimes doesn't even work in these cases.
Rel-canonical is generally a good bet, and it should go on the duplicate (you can actually put it on both, although it's not necessary).
-
Next time I'll read the reference links better
Thank you!
-
per google webmaster tools:
If Google knows that these pages have the same content, we may index only one version for our search results. Our algorithms select the page we think best answers the user's query. Now, however, users can specify a canonical page to search engines by adding a element with the attribute
rel="canonical"
to the section of the non-canonical version of the page. Adding this link and attribute lets site owners identify sets of identical content and suggest to Google: "Of all these pages with identical content, this page is the most useful. Please prioritize it in search results." -
Thanks Kyle. Anthony had a similar view on using the rel canonical tag. I'm just curious about adding it to both the original page or duplicate page? Or both?
Thanks,
Greg
-
Anthony, Thanks for your response. See Kyle, he also felt using the rel canonical tag was the best thing to do. However he seemed to think you'd put it on the original page - the one you want to rank for. And you're suggesting putting on the duplicate page. Should it be added to both while specifying which page is the 'original'?
Thanks!
Greg
-
I'm not sure I understand why the site owner seems to think that the duplicate content is necessary?
If I was in your situation I would be trying to convince the client to remove the duplicate content from their site, rather than trying to find a way around it.
If the information is difficult to find then this may be due to a problem with the site architecture. If the site does not flow well enough for visitors to find the information they need, then perhaps a site redesign is necessary.
-
Well, the answer would be yes and no. A robots.txt file would stop the bots from indexing the page, but links from other pages in site to that non indexed page could therefor make it crawlable and then indexed. AS posted in google webmaster tools here:
"You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one).
While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results."
I think the best way to avoid any conflict is applying the rel="canonical" tag to each duplicate page that you don't want indexed.
You can find more info on rel canonical here
Hope this helps out some.
-
The best way would be to use the Rel canonical tag
On the page you would like to rank for put the Rel canonical tag in
This lets google know that this is the original page.
Check out this link posted by Rand about the Rel canonical tag [http://www.seomoz.org/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps](http://www.seomoz.org/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
Is it ok to repeat a (focus) keyword used on a previous page, on a new page?
I am cataloguing the pages on our website in terms of which focus keyword has been used with the page. I've noticed that some pages repeated the same keyword / term. I've heard that it's not really good practice, as it's like telling google conflicting information, as the pages with the same keywords will be competing against each other. Is this correct information? If so, is the alternative to use various long-winded keywords instead? If not, meaning it's ok to repeat the keyword on different pages, is there a maximum recommended number of times that we want to repeat the word? Still new-ish to SEO, so any help is much appreciated! V.
Intermediate & Advanced SEO | | Vitzz1 -
Wondering if creating 256 new pages would cause duplicate content issues
I just completed a long post that reviews 16 landing page tools. I want to add 256 new pages that compare each tool against each other. For example: Leadpages vs. Instapage Leadpages vs. Unbounce Instapage vs. Unbounce, etc Each page will have one product's information on the left and the other on the right. So each page will be a unique combination BUT the same product information will be found on several other pages (its other comparisons vs the other 15 tools). This is because the Leadpages comparison information (a table) will be the same no matter which tool it is being compared against. If my math is correct, this will create 256 new pages - one for each combination of the 16 tools against each other! My site now is new and only has 6 posts/pages if that matters. Want to make sure I don't create a problem early on...Any thoughts?
Intermediate & Advanced SEO | | martechwiz0 -
Robots.txt
What would be a perfect robots.txt file my site is propdental.es Can i just place: User-agent: * Or should i write something more???
Intermediate & Advanced SEO | | maestrosonrisas0 -
MOZ crawl report says category pages blocked by meta robots but theyr'e not?
I've just run a SEOMOZ crawl report and it tells me that the category pages on my site such as http://www.top-10-dating-reviews.com/category/online-dating/ are blocked by meta robots and have the meta robots tag noindex,follow. This was the case a couple of days ago as I run wordpress and am using the SEO Category updater plugin. By default it appears it makes categories noindex, follow. Therefore I edited the plugin so that the default was index, follow as I want google to index the category pages so that I can build links to them. When I open the page in a browser and view source the tags show as index, follow which adds up. Why then is the SEOMOZ report telling me they are still noindex,follow? Presumably the crawl is in real time and should pick up the new follow tag or is it perhaps because its using data from an old crawl? As yet these pages aren't indexed by google. Any help is much appreciated! Thanks Sam.
Intermediate & Advanced SEO | | SamCUK0 -
Duplicate content on ecommerce sites
I just want to confirm something about duplicate content. On an eCommerce site, if the meta-titles, meta-descriptions and product descriptions are all unique, yet a big chunk at the bottom (featuring "why buy with us" etc) is copied across all product pages, would each page be penalised, or not indexed, for duplicate content? Does the whole page need to be a duplicate to be worried about this, or would this large chunk of text, bigger than the product description, have an effect on the page. If this would be a problem, what are some ways around it? Because the content is quite powerful, and is relavent to all products... Cheers,
Intermediate & Advanced SEO | | Creode0 -
How do I fix the error duplicate page content and duplicate page title?
On my site www.millsheating.co.uk I have the error message as per the question title. The conflict is coming from these two pages which are effectively the same page: www.millsheating.co.uk www.millsheating.co.uk/index I have added a htaccess file to the root folder as I thought (hoped) it would fix the problem but I doesn't appear to have done so. this is the content of the htaccess file: Options +FollowSymLinks RewriteEngine On RewriteCond %{HTTP_HOST} ^millsheating.co.uk RewriteRule (.*) http://www.millsheating.co.uk/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html\ HTTP/ RewriteRule ^index\.html$ http://www.millsheating.co.uk/ [R=301,L] AddType x-mapp-php5 .php
Intermediate & Advanced SEO | | JasonHegarty0 -
Google consolidating link juice on duplicate content pages
I've observed some strange findings on a website I am diagnosing and it has led me to a possible theory that seems to fly in the face of a lot of thinking: My theory is:
Intermediate & Advanced SEO | | James77
When google see's several duplicate content pages on a website, and decides to just show one version of the page, it at the same time agrigates the link juice pointing to all the duplicate pages, and ranks the 1 duplicate content page it decides to show as if all the link juice pointing to the duplicate versions were pointing to the 1 version. EG
Link X -> Duplicate Page A
Link Y -> Duplicate Page B Google decides Duplicate Page A is the one that is most important and applies the following formula to decide its rank. Link X + Link Y (Minus some dampening factor) -> Page A I came up with the idea after I seem to have reverse engineered this - IE the website I was trying to sort out for a client had this duplicate content, issue, so we decided to put unique content on Page A and Page B (not just one page like this but many). Bizarrely after about a week, all the Page A's dropped in rankings - indicating a possibility that the old link consolidation, may have been re-correctly associated with the two pages, so now Page A would only be getting Link Value X. Has anyone got any test/analysis to support or refute this??0