Duplicate Page content | What to do?
-
Hello Guys,
I have some duplicate pages detected by MOZ. Most of the URL´s are from a registracion process for users, so the URL´s are all like this:
www.exemple.com/user/login?destination=node/125%23comment-form
What should I do? Add this to robot txt? If so how? Whats the command to add in Google Webmaster?
Thanks in advance!
Pedro Pereira
-
Hi Carly,
It needs to be done to each of the pages. In most cases, this is just a minor change to a single page template. Someone might tell you that you can add an entry to robots.txt to solve the problem, but that won't remove them from the index.
Looking at the links you provided, I'm not convinced you should deindex them all - as these are member profile pages which might have some value in terms of driving organic traffic and having unique content on them. That said I'm not party to how your site works, so this is just an observation.
Hope that helps,
George
-
Hi George,
I am having a similar issue with my site, and was looking for a quick clarification.
We have several "member" pages that have been created as a part of registration (thousands) and they are appearing as duplicate content. When you say add noindex and and a canonical, is this something that needs to be done to every individual page or is there something that can be done that would apply to the thousands of pages at once?
Here are a couple of examples of what the pages look like:
http://loyalty360.org/me/members/8003
http://loyalty360.org/me/members/4641
Thank you!
-
1. If you add just noindex, Google will crawl the page, drop it from the index but it will also crawl the links on that page and potentially index them too. It basically passes equity to links on the page.
2. If you add nofollow, noindex, Google will crawl the page, drop it from the index but it will not crawl the links on that page. So no equity will be passed to them. As already established, Google may still put these links in the index, but it will display the standard "blocked" message for the page description.
If the links are internal, there's no harm in them being followed unless you're opening up the crawl to expose tons of duplicate content that isn't canonicalised.
noindex is often used with nofollow, but sometimes this is simply due to a misunderstanding of what impact they each have.
George
-
Hello,
Thanks for your response. I have learn more which is great
My question is should I add a noindex only to that page or a noidex, nofolow?
Thanks!
-
Yes it's the worst possible scenario that they basically get trapped in SERPs. Google won't then crawl them until you allow the crawling, then set noindex (to remove from SERPS) and then add nofollow,noindex back on to keep them out of SERPs and to stop Google following any links on them.
Configuring URL parameters again is just a directive regarding the crawl and doesn't affect indexing status to the best of my knowledge.
In my experience, noindex is bulletproof but nofollow / robots.txt is very often misunderstood and can lead to a lot of problems as a result. Some SEOs think they can be clever in crafting the flow of PageRank through a site. The unsurprising reality is that Google just does what it wants.
George
-
Hi George,
Thanks for this, It's very interesting... the urls do appear in search results but their descriptions are blocked(!)
Did you try configuring URL parameters in WMT as a solution?
-
Hi Rafal,
The key part of that statement is "we might still find and index information about disallowed URLs...". If you read the next sentence it says: "As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results".
If you look at moz.com/robots.txt you'll see an entry for:
Disallow: /pages/search_results*
But if you search this on Google:
site:moz.com/pages/search_results
You'll find there are 20 results in the index.
I used to agree with you, until I found out the hard way that if Google finds a link, regardless of whether it's in robots.txt or not it can put it in the index and it will remain there until you remove the nofollow restriction and noindex it, or remove it from the index using webmaster tools.
George
-
George,
I went to check with Google to make sure I am correct and I am!
"While Google won't crawl or index the content blocked by
robots.txt
, we might still find and index information about disallowed URLs from other places on the web." Source: https://support.google.com/webmasters/answer/6062608?hl=enYes, he can fix these problems on page but disallowing it in robots will work fine too!
-
Just adding this to robots.txt will not stop the pages being indexed:
Disallow: /*login?
It just means Google won't crawl the links on that page.
I would do one of the following:
1. Add noindex to the page. PR will still be passed to the page but they will no longer appear in SERPs.
2. Add a canonical on the page to: "www.exemple.com/user/login"
You're never going to try and get these pages to rank, so although it's worth fixing I wouldn't lose too much sleep on the impact of having duplicate content on registration pages (unless there are hundreds of them!).
Regards,
George
-
In GWT: Crawl=> URL Parameters => Configure URL Parameters => Add Parameter
Make sure you know what you are doing as it's easy to mess up and have BIG issues.
-
Add this line to your robots.txt to prevent google from indexing these pages:
Disallow: /*login?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Long list of companies spread out over several pages - duplicate content?
Hi all, I am currently working with a company formation agent. They have a list of every limited company spread over hundreds of pages. What do you guys think? Is there a need for Canonicals? The website is ranking pretty well but I want to make sure there aren't any problems in the future. Here are two pages as examples: http://www.formationsdirect.com/companysearchlist.aspx?start=MULLAGHBOY+CONSTRUCTION+LIMITED&next=1# http://www.formationsdirect.com/companysearchlist.aspx?start=%40a+company+limited&next=1# Also what about the actual company pages? See an example below http://www.formationsdirect.com/companysearchlist.aspx?name=AMNA+CONSTRUCTION+LTD&number=06630333#.U8PW6_ldX1s Thanks in advance Aaron
On-Page Optimization | | AaronGro0 -
Duplicate Page Titles? I thought this was good structure....
I have several warnings for duplicate page title.... I thought that I had good structure, but I guess I am doing something wrong. On my website (http://www.farnorthkennel.com), I am getting duplicate page errors for pages like this: http://www.farnorthkennel.com/german-shepherd-puppies-the-girls/hazel
On-Page Optimization | | Joshlaska
and
http://www.farnorthkennel.com/german-shepherd-puppies-the-girls/emerald I thought that this sort of structure was a good idea since the end page is different. Should each page be set up right after the original domain name? I'm new at this....0 -
Wordpress SEO. How to add static content above home page posts.
I think I many have some duplicate content issues as have been adding unque content above posts in categories using the all category SEO. How can I add static content to the posts on the home page though? Any help appreciated!
On-Page Optimization | | SamCUK0 -
Duplicate content issue
Hello, I got duplicate content issue on my home page : examplesite.com
On-Page Optimization | | digitalkiddie
examplesite.com/index.html Those page urls are with duplicate content. If in index.html i use 301 redirect like that : Header( "HTTP/1.1 301 Moved Permanently" );
Header( "Location: http://examplesite.com" );
?> would i loose any page authority ? sorry for the newbie question0 -
Is reported duplication on the pages or their canonical pages?
There are several sections getting flagged for duplication on one of our sites: http://mysite.com/section-1/?something=X&confirmed=true
On-Page Optimization | | Safelincs
http://mysite.com/section-2/?something=X&confirmed=true
http://mysite.com/section-3/?something=X&confirmed=true Each of the above are showing as having duplicates of the other sections. Indeed, these pages are exactly the same (it's just an SMS confirmation page you enter your code in), however, they all have canonical links back to the section (without the query string), i.e. section-1, section-2 and section-3 respectively. These three sections have unique content and aren't flagged up for duplications themselves, so my questions are: Are the pages with the query strings the duplicates, and if so why are the canonical links being ignored? or Are the canonical pages without the query strings the duplicates, and if so why don't they appear as URLs in their own right in the duplicate content report? I am guessing it's the former, but I can't figure out why it would ignore the canonical links. Any ideas? Thanks0 -
Suggestions to avoid duplicate content
Hi, we have about 6500 products, almost all with descriptions. SEOMOZ is showing about 2500 of them with duplicate content. The reason for this is that only one or two words are different for each product. For example, we have 500 award certificates. All are the same size and have the same description. But one is swimming, one baseball, one reading, etc, etc. Apparently the 1 word difference is not enough to differentiate. We have the same issue with our trophies - they are identical, except for figures. Does anyone have any good tips on how to change the content to avoid this issue and to avoid making up content for 2500 items? Thanks! Neil trophycentral.com
On-Page Optimization | | trophycentraltrophiesandawards0 -
Ecommerce: content on category pages
I have to optimize some online Shops and after Panda I really don't know what to think about thin content on product overview pages anymore... used to be that we could improve our rankings easily just by adding 1-2 sentences on such a page. This always worked for non-overly competitive terms. Now It feels like it doesn't work any longer, but I couldn't put my finger on it and I don't have the resources to test. Here's an example of what I mean: http://www.geschenkidee.ch/wandtattoos/aus_aller_welt.html
On-Page Optimization | | zeepartner
I would add max. 3 lines of text directly over the product thumbnails. What do you think? Is it worth adding some text on a product overview page or do I not even have to bother post-Panda?0 -
How much constitutes duplicate content in your opinion?
Mornin' In your experience, how much constitutes duplicate content? A sentence, a paragraph, half a page, etc? What about quotes - are they considered duplications, too, if there aren't quotation marks? Over the years, the client has been a bit bad in taking a paragraph from here, a sentence from there, and coupling it all together as daily news on their site. I'm now in the middle of a purge. Oh boy! All hail originality.
On-Page Optimization | | Martin_S0